[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [XTF PATCH] xtf-runner: fix two synchronisation issues



On Fri, Jul 29, 2016 at 03:31:30PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
> > On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> > > The runner existing before xl has torn down the guest is very
> > > deliberate, because some part of hvm guests is terribly slow to tear
> > > down; waiting synchronously for teardown tripled the wallclock time to
> > > run a load of tests back-to-back.
> > 
> > Then you won't know if a guest is leaked or it is being slowly destroyed
> > when a dead guest shows up in the snapshot of 'xl list'.
> > 
> > Also consider that would make back-to-back tests that happen to have a
> > guest that has the same name as the one in previous test fail.
> > 
> > I don't think getting blocked for a few more seconds is a big issue.
> > It's is important to eliminate such race conditions so that osstest can
> > work properly.
> 
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.
> 
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
> 
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
> 
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up.  (And these two activities are regarded as part of the preceding
> test step.)
> 
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest.  That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up.  (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)
> 
> Wei: I need to check what happens with multiple failing test steps in
> the same job.  Specifically, I need to check which one the bisector
> is likely to try to attack.
> 

Yes. I think my current code can meet both you and Andrew's
requirement.

1. The runner waits for all tests to finish, which amortise the clean up
   time. This is what Andrew needs.
2. In osstest, we run one test case at a time. So "all tests" is only
   one test. This is what you need.

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.