[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
On 14.10.2020 13:40, Julien Grall wrote: > Hi Jan, > > On 13/10/2020 15:26, Jan Beulich wrote: >> On 13.10.2020 16:20, Jürgen Groß wrote: >>> On 13.10.20 15:58, Jan Beulich wrote: >>>> On 12.10.2020 11:27, Juergen Gross wrote: >>>>> The queue for a fifo event is depending on the vcpu_id and the >>>>> priority of the event. When sending an event it might happen the >>>>> event needs to change queues and the old queue needs to be kept for >>>>> keeping the links between queue elements intact. For this purpose >>>>> the event channel contains last_priority and last_vcpu_id values >>>>> elements for being able to identify the old queue. >>>>> >>>>> In order to avoid races always access last_priority and last_vcpu_id >>>>> with a single atomic operation avoiding any inconsistencies. >>>>> >>>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >>>> >>>> I seem to vaguely recall that at the time this seemingly racy >>>> access was done on purpose by David. Did you go look at the >>>> old commits to understand whether there really is a race which >>>> can't be tolerated within the spec? >>> >>> At least the comments in the code tell us that the race regarding >>> the writing of priority (not last_priority) is acceptable. >> >> Ah, then it was comments. I knew I read something to this effect >> somewhere, recently. >> >>> Especially Julien was rather worried by the current situation. In >>> case you can convince him the current handling is fine, we can >>> easily drop this patch. >> >> Julien, in the light of the above - can you clarify the specific >> concerns you (still) have? > > Let me start with that the assumption if evtchn->lock is not held when > evtchn_fifo_set_pending() is called. If it is held, then my comment is moot. But this isn't interesting - we know there are paths where it is held, and ones (interdomain sending) where it's the remote port's lock instead which is held. What's important here is that a _consistent_ lock be held (but it doesn't need to be evtchn's). > From my understanding, the goal of lock_old_queue() is to return the > old queue used. > > last_priority and last_vcpu_id may be updated separately and I could not > convince myself that it would not be possible to return a queue that is > neither the current one nor the old one. > > The following could happen if evtchn->priority and > evtchn->notify_vcpu_id keeps changing between calls. > > pCPU0 | pCPU1 > | > evtchn_fifo_set_pending(v0,...) | > | evtchn_fifo_set_pending(v1, ...) > [...] | > /* Queue has changed */ | > evtchn->last_vcpu_id = v0 | > | -> evtchn_old_queue() > | v = d->vcpu[evtchn->last_vcpu_id]; > | old_q = ... > | spin_lock(old_q->...) > | v = ... > | q = ... > | /* q and old_q would be the same */ > | > evtchn->las_priority = priority| > > If my diagram is correct, then pCPU1 would return a queue that is > neither the current nor old one. I think I agree. > In which case, I think it would at least be possible to corrupt the > queue. From evtchn_fifo_set_pending(): > > /* > * If this event was a tail, the old queue is now empty and > * its tail must be invalidated to prevent adding an event to > * the old queue from corrupting the new queue. > */ > if ( old_q->tail == port ) > old_q->tail = 0; > > Did I miss anything? I don't think you did. The important point though is that a consistent lock is being held whenever we come here, so two racing set_pending() aren't possible for one and the same evtchn. As a result I don't think the patch here is actually needed. If I take this further, then I think I can reason why it wasn't necessary to add further locking to send_guest_{global,vcpu}_virq(): The virq_lock is the "consistent lock" protecting ECS_VIRQ ports. The spin_barrier() while closing the port guards that side against the port changing to a different ECS_* behind the sending functions' backs. And binding such ports sets ->virq_to_evtchn[] last, with a suitable barrier (the unlock). Which leaves send_guest_pirq() before we can drop the IRQ-safe locking again. I guess we would need to work towards using the underlying irq_desc's lock as consistent lock here, but this certainly isn't the case just yet, and I'm not really certain this can be achieved. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |