Xen project Mailing List

Re: [Xen-devel] Enabling vm_event for a guest with more VCPUs than available ring buffer slots freezes the virtual machine

To: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

From: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Date: Tue, 7 Feb 2017 11:15:33 -0700

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Tue, 07 Feb 2017 18:15:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> wrote:

Hello,

Setting, e.g. 16 VCPUs for a HVM guest, ends up blocking the guest
completely when subscribing to vm_events, apparently because of this
code in xen/common/vm_event.c:

315 /* Give this vCPU a black eye if necessary, on the way out.
316 * See the comments above wake_blocked() for more information
317 * on how this mechanism works to avoid waiting. */
318 avail_req = vm_event_ring_available(ved);
319 if( current->domain == d && avail_req < d->max_vcpus )
320 vm_event_mark_and_pause(current, ved);

It would appear that even if the guest only has 2 online VCPUs, the
"avail_req < d->max_vcpus" condition will pause current, and we
eventually end up with all the VCPUs paused.

An ugly hack ("avail_req < 2") has allowed booting a guest with many
VCPUs (max_vcpus, the guest only brings 2 VCPUs online), however that's
just to prove that that was the culprit - a real solution to this needs
more in-depth understading of the issue and potential solution. That's
basically very old code (pre-2012 at least) that got moved around into
the current shape of Xen today - please CC anyone relevant to the
discussion that you're aware of.

Thoughts?

I think is a side-effect of the growth of the vm_event structure and the fact that we have a single page ring. The check effectively sets a threshold of having enough space for each vCPU to place at least one more event on the ring, and if that's not the case it gets paused. OTOH I think this would only have an effect on asynchronous events, for all other events the vCPU is already paused. Is that the case you have?

Tamas

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel