Xen project Mailing List

On Tue, Feb 7, 2017 at 11:48 PM, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> wrote:

On 02/08/2017 01:23 AM, Tamas K Lengyel wrote:
>
>
> On Tue, Feb 7, 2017 at 1:41 PM, Razvan Cojocaru
> <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@bitdefender.com>> wrote:
>
> On 02/07/2017 10:20 PM, Tamas K Lengyel wrote:
> >
> >
> > On Tue, Feb 7, 2017 at 11:57 AM, Razvan Cojocaru
> > <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@bitdefender.com>

> <mailto:rcojocaru@bitdefender.com
> <mailto:rcojocaru@bitdefender.com>>> wrote:
> >
> > On 02/07/2017 08:39 PM, Andrew Cooper wrote:
> > > On 07/02/17 18:31, Razvan Cojocaru wrote:
> > >> On 02/07/2017 08:15 PM, Tamas K Lengyel wrote:
> > >>>
> > >>> On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru
> > >>> <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@bitdefender.com>
> <mailto:rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>>
> > <mailto:rcojocaru@bitdefender.com
> <mailto:rcojocaru@bitdefender.com>
> > <mailto:rcojocaru@bitdefender.com
> <mailto:rcojocaru@bitdefender.com>>>> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> Setting, e.g. 16 VCPUs for a HVM guest, ends up
> blocking the
> > guest
> > >>> completely when subscribing to vm_events, apparently
> because
> > of this
> > >>> code in xen/common/vm_event.c:
> > >>>
> > >>> 315 /* Give this vCPU a black eye if necessary, on the
> > way out.
> > >>> 316 * See the comments above wake_blocked() for more
> > information
> > >>> 317 * on how this mechanism works to avoid
> waiting. */
> > >>> 318 avail_req = vm_event_ring_available(ved);
> > >>> 319 if( current->domain == d && avail_req <
> d->max_vcpus )
> > >>> 320 vm_event_mark_and_pause(current, ved);
> > >>>
> > >>> It would appear that even if the guest only has 2 online
> > VCPUs, the
> > >>> "avail_req < d->max_vcpus" condition will pause
> current, and we
> > >>> eventually end up with all the VCPUs paused.
> > >>>
> > >>> An ugly hack ("avail_req < 2") has allowed booting a guest
> > with many
> > >>> VCPUs (max_vcpus, the guest only brings 2 VCPUs online),
> > however that's
> > >>> just to prove that that was the culprit - a real
> solution to
> > this needs
> > >>> more in-depth understading of the issue and potential
> > solution. That's
> > >>> basically very old code (pre-2012 at least) that got moved
> > around into
> > >>> the current shape of Xen today - please CC anyone relevant
> > to the
> > >>> discussion that you're aware of.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>>
> > >>> I think is a side-effect of the growth of the vm_event
> structure
> > and the
> > >>> fact that we have a single page ring. The check
> effectively sets a
> > >>> threshold of having enough space for each vCPU to place at
> least one
> > >>> more event on the ring, and if that's not the case it gets
> > paused. OTOH
> > >>> I think this would only have an effect on asynchronous events,
> > for all
> > >>> other events the vCPU is already paused. Is that the case
> you have?
> > >> No, on the contrary, all my events are synchronous (the VCPU is
> > paused
> > >> waiting for the vm_event reply).
> > >>
> > >> I've debugged this a bit, and the problem seems to be that
> > >> vm_event_wake_blocked() breaks here:
> > >>
> > >> 150 /* We remember which vcpu last woke up to avoid
> scanning
> > always
> > >> linearly
> > >> 151 * from zero and starving higher-numbered vcpus under
> > high load */
> > >> 152 if ( d->vcpu )
> > >> 153 {
> > >> 154 int i, j, k;
> > >> 155
> > >> 156 for (i = ved->last_vcpu_wake_up + 1, j = 0; j <
> > >> d->max_vcpus; i++, j++)
> > >> 157 {
> > >> 158 k = i % d->max_vcpus;
> > >> 159 v = d->vcpu[k];
> > >> 160 if ( !v )
> > >> 161 continue;
> > >> 162
> > >> 163 if ( !(ved->blocked) || online >= avail_req )
> > >> 164 break;
> > >> 165
> > >> 166 if ( test_and_clear_bit(ved->pause_flag,
> > &v->pause_flags) )
> > >> 167 {
> > >> 168 vcpu_unpause(v);
> > >> 169 online++;
> > >> 170 ved->blocked--;
> > >> 171 ved->last_vcpu_wake_up = k;
> > >> 172 }
> > >> 173 }
> > >> 174 }
> > >>
> > >> at "if ( !(ved->blocked) || online >= avail_req )". At this
> point,
> > >> nothing ever gets unblocked. It's hard to believe that this is
> > desired
> > >> behaviour, as I don't know what could possibly happen for that
> > condition
> > >> to become false once all the online VCPUs are stuck (especially
> > when the
> > >> guest has just started booting).
> >
> >
> > Ah I see what happens. During boot vCPU 0 generates an event and gets
> > marked blocked because the number of vCPUs is so high. The other vCPUs
> > are still unblocked since they are idle, but this test here will still
> > be true (online >= avail_req) and thus we can never unblock vCPU0. And
> > then the boot process is hanging because vCPU0 never resumes. I would
> > argue that this test should be changed to check that there is at
> least 1
> > spot on the ring and only break if that is not the case anymore (ie.
> > instead of incrementing online we should be decrementing avail_req).
>
> That is exactly what happens. And it can't really be fixed just by
> increasing the ring buffer (although that definitely helps a lot and
> would be a smart move): no matter how large it is, we can always ask the
> domain to use more VCPUs than there are slots in the buffer.
>
>
> No, not increasing the ring buffer but fixing the check to unblock when
> there is at least 1 spot on the ring. So look at this comment...
>
> /*
> * vm_event_wake_blocked() will wakeup vcpus waiting for room in the
> * ring. These vCPUs were paused on their way out after placing an event,
> * but need to be resumed where the ring is capable of processing at least
> * one event from them.
> */
>
> .. it seems to me that the check "online >= avail_req" was just wrong
> from the start.

I've read that to read that there need to be more than one available
slot in the ring: "wake vcpus [plural] [...] at leas one event from them".

Right, I mean we can unblock all vCPUs as soon as there is 1 spot on the ring. If they trap again, they will just be placed in the waitqueue instead of being blocked. I think that would work fine.

Tamas

Re: [Xen-devel] Enabling vm_event for a guest with more VCPUs than available ring buffer slots freezes the virtual machine