[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [XTF PATCH] xtf-runner: fix two synchronisation issues



On 29/07/16 15:31, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>>> The runner existing before xl has torn down the guest is very
>>> deliberate, because some part of hvm guests is terribly slow to tear
>>> down; waiting synchronously for teardown tripled the wallclock time to
>>> run a load of tests back-to-back.
>> Then you won't know if a guest is leaked or it is being slowly destroyed
>> when a dead guest shows up in the snapshot of 'xl list'.
>>
>> Also consider that would make back-to-back tests that happen to have a
>> guest that has the same name as the one in previous test fail.
>>
>> I don't think getting blocked for a few more seconds is a big issue.
>> It's is important to eliminate such race conditions so that osstest can
>> work properly.
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.

That is perfectly reasonable.

>
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
>
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
>
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up.  (And these two activities are regarded as part of the preceding
> test step.)

That is also my understanding of how the intended OSSTest integration is
going to work.

OSSTest asks `./xtf-runner --list` for all tests, then iterates over all
tests, running them one at a time, with suitable liveness checks
inbetween.  This is not using xtf-runner's ability to run multiple tests
back to back.


The dev usecase on the other hand is for something like, for checking a
test case refactoring or new bit of functionality.

$ ./xtf-runner sefltest
<snip>
Combined test results:
test-pv64-selftest                       SUCCESS
test-pv32pae-selftest                    SUCCESS
test-hvm64-selftest                      SUCCESS
test-hvm32pae-selftest                   SUCCESS
test-hvm32pse-selftest                   SUCCESS
test-hvm32-selftest                      SUCCESS


FWIW, I have just put a synchronous wait in to demonstrate.

Without wait:

$ time ./xtf-runner sefltest
<snip>

real    0m0.571s
user    0m0.060s
sys    0m0.228s

With wait:
$ time ./xtf-runner sefltest
<snip>

real    0m8.870s
user    0m0.048s
sys    0m0.280s


That is more than 8 wallclock seconds elapsed where nothing useful is
happening from the point of view of a human using ./xtf-runner.  All of
this time is spent between @releaseDomain and `xl create -F` finally
exiting.

>
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest.  That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up.  (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)

OSSTest shouldn't run multiple tests at once, and I have taken exactly
the same decision for XenRT.  Easy identification of what went bang is
the most important properly in these cases.

We are going to have to get to a vast test library before the wallclock
time of XTF tests approaches anything similar to installing a VM from
scratch.  I am not worried at the moment.

>
> Wei: I need to check what happens with multiple failing test steps in
> the same job.  Specifically, I need to check which one the bisector
> is likely to try to attack.

For individual XTF tests, it is entirely possible that every failure is
from a different change, so should be treated individually.

Having said that, it is also quite likely that, given a lot of similar
microckernels, one hypervisor bug would take a large number out at once,
and we really don't want to bisect each individual XTF test.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.