Xen project Mailing List

Re: [Xen-devel] Issues with the xen-access.c model

To: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 5 Apr 2016 13:13:39 +0100

Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Delivery-date: Tue, 05 Apr 2016 12:15:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 05/04/16 11:55, Razvan Cojocaru wrote: > Hello, > > xen-access.c does roughly this: > > for (;;) { > poll_evt_channel(); > > if (new_events) { > while (ring_buffer_has_requests) { > pull_request(); > process_request(); > put_response(); > } > > signal_evt_channel(); > } > } > > The problems are these: > > 1. vm_event_put_request() does notify_via_xen_event_channel(d, > ved->xen_port); at the very end, and we seem to be using a FIFO event > channel (if I'm reading the code right, it's generic event channel code > that pre-dates the vm_event code that sits on top of it). > > This means that for a guest that's running two VCPUs, the following > scenario is possible: > > * VCPU1 puts an event in the ring buffer > * VCPU1 signals the event channel > * poll_evt_channel() wakes up > * VCPU2 puts an event in the ring buffer > * the loop processes the first event > * VCPU2 signals the event channel > * the loop processes the second event (since there _are_ more requests > in the ring buffer) > * the loop signals the event channel that requests have been processed > * poll_evt_channel() wakes up again, because of the second event channel > signal, but this is pointless since all events have already been processed > > This can be avoided by counting the requests processed vs. the number of > succesful poll()s, but it still does not solve the second problem: > > 2. Race conditions. At least in theory, there's nothing to stop the > second iteration of the request processing loop to read a partially > written request from the ring buffer, or garbage left over from a > previous request, if it hits it "just right" (i.e. the "VCPU2 puts an > event in the ring buffer" part can run on top of the "the loop processes > the second event"). > > All this doesn't really show up unless we're in heavy processing > scenarios, but they can occur, especially in synced vm_event scenarios. > > The problems seems to be alleviated by changing the loop to this: > > for (;;) { > poll_evt_channel(); > > if (new_events) { > while (ring_buffer_has_requests) { > pull_request(); > process_request(); > put_response(); > > /* Moved here. */ > signal_evt_channel(); > } > } > } > > but it's obviously not foolproof, the main problem remaining the > synchronization of the ring buffer from both the hypervisor side and > userspace. > > How should we proceed to fix this? The vm_event code in Xen must increment the producer index as the final action of putting a request on the ring. If it doesn't then it is simply broken and needs fixing. This will ensure that the consumer never sees a partial request on the ring; by the time it observes a newer producer index, all data is already written. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.