[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [osstest test] 110909: tolerable FAIL - PUSHED



On 21/06/2017 23:59, Ian Jackson wrote:
> osstest service owner writes ("[osstest test] 110909: tolerable FAIL - 
> PUSHED"):
>> flight 110909 osstest real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/110909/
>>
>> Failures :-/ but no regressions.
> ...
>> Tests which did not succeed, but are not blocking:
> ...
>>  test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 
>> 110373
> This guest had ~31G of disk and 1.5G of RAM.
>
> The logfile
>
>   
> http://logs.test-lab.xenproject.org/osstest/logs/110909/test-amd64-i386-xl-qemuu-win7-amd64/15.ts-guest-localmigrate.log
>
> seems to show that the guest is paused (state "p") following the 9th
> migration.  This is weird, given that xl seems to say earlier
> "migration target: Domain started successsfully", which message
> follows the call to libxl_domain_unpause.
>
> I wonder if it is possible that the domain still appears paused
> briefly after xl/libxlq tries to unpause it.  That is, that
> XEN_DOMINF_paused might be set in the return from
> xc_domain_getinfolist even after the unpause domctl returns.
>
> By the time log collection runs, the domain seems unpaused.

XEN_DOMINF_paused is a straight reflection of
d->controller_pause_count.  A domain is created with 1 reference count,
requiring the toolstack to call DOMCTL_unpause_domain once to cause it
to start executing.

Other than that, it is strictly reference counted based on pause and
unpause hypercalls from toolstack components (in this case, all in dom0).

One issue which XenServer has found in combination with Introspection is
that any toolstack entity which can call pause/unpause (even for a short
period of time) can result in XEN_DOMINF_paused being sampled as being set.

The fix ^W utterly gross hack for XenServer's purposes is
https://github.com/xenserver/xen-4.7.pg/blob/master/master/xen-introspection-pause.patch
but I don't yet have a sensible plan for how to fix this in general. 
One option would be to introduce hypercall pairs per toolstack
component, but that doesn't scale sensibly.

In this case, what condition causes the failure?  Is it simply seeing
the domain as paused (in which case, there will definitely be a
low-probability false negative rate if anything else in dom0 uses domain
pause), or is it some other failure which prompts for the paused state
check?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.