[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling vm_event for a guest with more VCPUs than available ring buffer slots freezes the virtual machine
On 02/07/2017 08:15 PM, Tamas K Lengyel wrote: > > > On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru > <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@xxxxxxxxxxxxxxx>> wrote: > > Hello, > > Setting, e.g. 16 VCPUs for a HVM guest, ends up blocking the guest > completely when subscribing to vm_events, apparently because of this > code in xen/common/vm_event.c: > > 315 /* Give this vCPU a black eye if necessary, on the way out. > 316 * See the comments above wake_blocked() for more information > 317 * on how this mechanism works to avoid waiting. */ > 318 avail_req = vm_event_ring_available(ved); > 319 if( current->domain == d && avail_req < d->max_vcpus ) > 320 vm_event_mark_and_pause(current, ved); > > It would appear that even if the guest only has 2 online VCPUs, the > "avail_req < d->max_vcpus" condition will pause current, and we > eventually end up with all the VCPUs paused. > > An ugly hack ("avail_req < 2") has allowed booting a guest with many > VCPUs (max_vcpus, the guest only brings 2 VCPUs online), however that's > just to prove that that was the culprit - a real solution to this needs > more in-depth understading of the issue and potential solution. That's > basically very old code (pre-2012 at least) that got moved around into > the current shape of Xen today - please CC anyone relevant to the > discussion that you're aware of. > > Thoughts? > > > I think is a side-effect of the growth of the vm_event structure and the > fact that we have a single page ring. The check effectively sets a > threshold of having enough space for each vCPU to place at least one > more event on the ring, and if that's not the case it gets paused. OTOH > I think this would only have an effect on asynchronous events, for all > other events the vCPU is already paused. Is that the case you have? No, on the contrary, all my events are synchronous (the VCPU is paused waiting for the vm_event reply). I've debugged this a bit, and the problem seems to be that vm_event_wake_blocked() breaks here: 150 /* We remember which vcpu last woke up to avoid scanning always linearly 151 * from zero and starving higher-numbered vcpus under high load */ 152 if ( d->vcpu ) 153 { 154 int i, j, k; 155 156 for (i = ved->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; i++, j++) 157 { 158 k = i % d->max_vcpus; 159 v = d->vcpu[k]; 160 if ( !v ) 161 continue; 162 163 if ( !(ved->blocked) || online >= avail_req ) 164 break; 165 166 if ( test_and_clear_bit(ved->pause_flag, &v->pause_flags) ) 167 { 168 vcpu_unpause(v); 169 online++; 170 ved->blocked--; 171 ved->last_vcpu_wake_up = k; 172 } 173 } 174 } at "if ( !(ved->blocked) || online >= avail_req )". At this point, nothing ever gets unblocked. It's hard to believe that this is desired behaviour, as I don't know what could possibly happen for that condition to become false once all the online VCPUs are stuck (especially when the guest has just started booting). Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |