[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] xen/event: Add reference counting to event channel



On 09/19/2011 12:22 PM, Jeremy Fitzhardinge wrote:
> On 09/19/2011 08:50 AM, Daniel De Graaf wrote:
>> On 09/17/2011 12:50 AM, Jeremy Fitzhardinge wrote:
>>> On 09/16/2011 02:14 PM, Daniel De Graaf wrote:
>>>> Event channels exposed to userspace by the evtchn module may be used by
>>>> other modules in an asynchronous manner, which requires that reference
>>>> counting be used to prevent the event channel from being closed before
>>>> the signals are delivered.
>>> Could you use the refcounting at the irq level?  I was quite pleased to
>>> have removed the event channel refcounting (and the use of naked event
>>> channels).
>> This looked more complex: unbind_from_irq has cleanup that happens after
>> EVTCHNOP_close but before the irq-refcount-protected close operation.
>> For the IRQ-level refcounting to be useful here, the EVTCHNOP_close would
>> have to be postponed. The evtchn_to_irq mapping is also maintained here.
>>  
>>> Oh, is it that userspace allocates an event channel with /dev/evtchn,
>>> then passes that event channel to the gntalloc/gntdev drivers so they
>>> can use it to pass events between the two.  That's a bit unfortunate; it
>>> might have been better to expose those event channels as file
>>> descriptors so you could use fd refcounting to manage the lifetimes.
>> This would also make event channels simpler to use from userspace, avoiding
>> the extra read() to determine what event channel has fired (and avoiding
>> almost all userspace knowledge of the local event channel number). However,
>> this would require practically rewriting the /dev/xen/evtchn userspace API.
> 
> If you had an app which cared about many events, then having an fd for
> each would be pretty cumbersome.  But perhaps it would be worth
> considering extending the current API a bit to add a general "set of
> events fd", where a given event can be the member of multiple sets.
 
This seems to overlap a lot with epoll, which can already do overlapping
file descriptor readiness checks.

>> With the current event channel API, you would have to pass both the event
>> channel number and the evtchn file descriptor in order to get an fd reference
>> which seems redundant and might cause confusion since keeping the fd
>> reference would also prevent cleanup of other event channels associated with
>> the shared file descriptor. It would also require changing the notify ioctl
>> parameters since there is currently no need to pass an evtchn fd.
> 
> Yes, though there's no reason even now one couldn't be careful to bind
> your events to a given fd where they share similar lifetimes/uses (even
> to the extent of allocating single-event fds).
> 
> On the other hand, is there really a need to make a connection between
> the gnt and evtchn devices?  Why couldn't the gnt driver just do its own
> form of event fd and management independent of /dev/evtchn?  I know it
> seems a bit redundant, but coupling the two adds its own complexities.

I don't think there is a real need to make a connection, and even with this
patch there isn't one: only tracking on the event channel, not on the evtchn
device or its FDs.

Unless you're talking about having /dev/xen/gnt* able to manage event channels
independent of /dev/xen/evtchn? To be useful, this would either need to be able
to both send and receive events, or to share local/remote ports with evtchn.

>>> What's the downside of sending the event after the event channel has closed?
>> The event won't get sent at all, since the hypervisor sees that the local
>> end of the event channel is closed. If the local event channel number
>> happens to get reused, the event could also get sent to the wrong event
>> channel, although this is generally harmless.
> 
> I think /dev/evtchn port numbers are never reused.  And it sounds like
> it is purely a usermode problem if they close the event channel prematurely.

The trigger in this case is the kernel closing file descriptors when a
process crashes. In particular:

1. open /dev/xen/evtchn (FD 3) and bind_interdomain an event channel
2. open /dev/xen/gntdev (FD 4) map a granted page
3. use unmap_notify on fd-4 to setup a notification on the event channel
   allocated in step 1
4. segfault, SIGKILL, or some other exit without proper cleanup

At this point, the kernel closes file descriptors in numerical order, which
will cause the event channel to be closed before the gntdev device is able
to send its unmap notification event. In libvchan, I make sure to swap steps
1 and 2 to avoid triggering this bug, but that is clearly a workaround and
not a proper solution.

I have seen /dev/xen/evtchn ports be reused; the comment in evtchn.c is
misleading or incorrect. The port is allocated by the hypervisor in
xen/common/event_channel.c:get_free_port.

>     J
> 
>>> That said:
>>>
>>>> Signed-off-by: Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>
>>>> ---
>>>>  drivers/xen/events.c |   38 ++++++++++++++++++++++++++++++++++++++
>>>>  include/xen/events.h |    6 ++++++
>>>>  2 files changed, 44 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
>>>> index da70f5c..c9343b9 100644
>>>> --- a/drivers/xen/events.c
>>>> +++ b/drivers/xen/events.c
>>>> @@ -89,6 +89,7 @@ struct irq_info
>>>>  {
>>>>    struct list_head list;
>>>>    enum xen_irq_type type; /* type */
>>>> +  unsigned short refcount;
>>> Is short large enough?  Is this something that untrusted userspace could
>>> end up wrapping?  If short is sufficient, you should pack it next to the
>>> other short fields to avoid a gap.
>> This would be quite difficult: since only gntdev/gntalloc use these counts
>> for now, you would have to map or allocate 64K pages; the gntalloc driver
>> has a default limit of 1024 pages and gntdev is limited by the hypervisor
>> max_nr_grant_frames.
>>
>> Anyway, changing to atomic_t should
>>  
>>>>    unsigned irq;
>>>>    unsigned short evtchn;  /* event channel */
>>>>    unsigned short cpu;     /* cpu bound */
>>>> @@ -407,6 +408,7 @@ static void xen_irq_init(unsigned irq)
>>>>            panic("Unable to allocate metadata for IRQ%d\n", irq);
>>>>  
>>>>    info->type = IRQT_UNBOUND;
>>>> +  info->refcount = 1;
>>>>  
>>>>    irq_set_handler_data(irq, info);
>>>>  
>>>> @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq)
>>>>  
>>>>    irq_set_handler_data(irq, NULL);
>>>>  
>>>> +  BUG_ON(info->refcount > 1);
>>>> +
>>>>    kfree(info);
>>>>  
>>>>    /* Legacy IRQ descriptors are managed by the arch. */
>>>> @@ -912,9 +916,14 @@ static void unbind_from_irq(unsigned int irq)
>>>>  {
>>>>    struct evtchn_close close;
>>>>    int evtchn = evtchn_from_irq(irq);
>>>> +  struct irq_info *info = irq_get_handler_data(irq);
>>>>  
>>>>    spin_lock(&irq_mapping_update_lock);
>>>>  
>>>> +  info->refcount--;
>>>> +  if (info->refcount > 0)
>>>> +          goto out_unlock;
>>>> +
>>>>    if (VALID_EVTCHN(evtchn)) {
>>>>            close.port = evtchn;
>>>>            if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
>>>> @@ -943,6 +952,7 @@ static void unbind_from_irq(unsigned int irq)
>>>>  
>>>>    xen_free_irq(irq);
>>>>  
>>>> + out_unlock:
>>>>    spin_unlock(&irq_mapping_update_lock);
>>>>  }
>>>>  
>>>> @@ -1038,6 +1048,34 @@ void unbind_from_irqhandler(unsigned int irq, void 
>>>> *dev_id)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(unbind_from_irqhandler);
>>>>  
>>>> +int get_evtchn_reservation(unsigned int evtchn)
>>> "reservation"?  I think just evtchn_get/put would be more consistent
>>> with kernel naming conventions.
>> Yes.
>>
>>>> +{
>>>> +  int irq = evtchn_to_irq[evtchn];
>>>> +  struct irq_info *info;
>>>> +
>>>> +  if (irq == -1)
>>>> +          return -ENOENT;
>>>> +
>>>> +  info = irq_get_handler_data(irq);
>>>> +
>>>> +  if (!info)
>>>> +          return -ENOENT;
>>>> +
>>>> +  spin_lock(&irq_mapping_update_lock);
>>>> +  info->refcount++;
>>>> +  spin_unlock(&irq_mapping_update_lock);
>>> What is this spinlock protecting against?  The non-atomicity of ++, or
>>> something larger scale?  If its just an atomicity thing, should it be an
>>> atomic_t?
>> Just atomicity.
>>
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(get_evtchn_reservation);
>>>> +
>>>> +void put_evtchn_reservation(unsigned int evtchn)
>>>> +{
>>>> +  int irq = evtchn_to_irq[evtchn];
>>>> +  unbind_from_irq(irq);
>>> Hm.
>> Similar name change here.
>>
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(put_evtchn_reservation);
>>>> +
>>>>  void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
>>>>  {
>>>>    int irq = per_cpu(ipi_to_irq, cpu)[vector];
>>>> diff --git a/include/xen/events.h b/include/xen/events.h
>>>> index d287997..23bd5fd 100644
>>>> --- a/include/xen/events.h
>>>> +++ b/include/xen/events.h
>>>> @@ -37,6 +37,12 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int 
>>>> remote_domain,
>>>>   */
>>>>  void unbind_from_irqhandler(unsigned int irq, void *dev_id);
>>>>  
>>>> +/*
>>>> + * Allow extra references to event channels exposed to userspace by evtchn
>>>> + */
>>>> +int get_evtchn_reservation(unsigned int evtchn);
>>>> +void put_evtchn_reservation(unsigned int evtchn);
>>>> +
>>>>  void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector);
>>>>  int resend_irq_on_evtchn(unsigned int irq);
>>>>  void rebind_evtchn_irq(int evtchn, int irq);
>>
> 


-- 
Daniel De Graaf
National Security Agency

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.