[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [XTF PATCH] xtf-runner: fix two synchronisation issues



On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
> On 29/07/16 13:07, Wei Liu wrote:
> > There were two synchronisation issues for the old code:
> >
> > 1. There was no guarantee that guest console was ready before "xl
> >    console" invocation.
> > 2. There was no guarantee that runner wouldn't not exit before all test

s/not//

> >    guests were gone.
> 
> Sorry, but I can't parse this.
> 
> The runner existing before xl has torn down the guest is very
> deliberate, because some part of hvm guests is terribly slow to tear
> down; waiting synchronously for teardown tripled the wallclock time to
> run a load of tests back-to-back.
> 

Then you won't know if a guest is leaked or it is being slowly destroyed
when a dead guest shows up in the snapshot of 'xl list'.

Also consider that would make back-to-back tests that happen to have a
guest that has the same name as the one in previous test fail.

I don't think getting blocked for a few more seconds is a big issue.
It's is important to eliminate such race conditions so that osstest can
work properly.

> > @@ -132,24 +141,53 @@ def list_tests(args):
> >  
> >          print name
> >  
> > +# A list of processes that we need to wait until they exits.
> > +wait_list = []
> >  
> >  def run_test(test):
> >      """ Run a specific test """
> >  
> > +    console_path = '/local/domain/0/backend/console'
> > +    # Setup watch for console
> > +    cmd = ['xenstore-watch', console_path]
> > +    print "Executing '%s'" % (" ".join(cmd), )
> > +    xs_watch = Popen(cmd, stdout = PIPE, stderr = PIPE)
> > +
> >      _, _, name = test.split('-', 2)
> >  
> >      cfg = path.join("tests", name, test + ".cfg")
> >  
> > -    cmd = ['xl', 'create', '-p', cfg]
> > +    cmd = ['xl', 'create', '-Fp', cfg]
> >  
> >      print "Executing '%s'" % (" ".join(cmd), )
> > -    rc = subproc_call(cmd)
> > -    if rc:
> > -        raise RunnerError("Failed to create VM")
> > +    wait = Popen(cmd, stdout = DEVNULL, stderr = DEVNULL)
> 
> I would name this "create" rather than "wait"
> 

NP.

> > +    wait_list.append(wait)
> > +
> > +    # Wait up to 5 seconds for console to show up
> > +    signal.alarm(5)
> > +    while True:
> > +        l = xs_watch.stdout.readline()
> > +        domid = l[len(console_path)+1:].split('/')[0]
> > +        if domid == '': continue
> 
> Please put continues and breaks on newlines.
> 
> if not domid:
>     continue

Fixed.

> 
> > +
> > +        cmd = ['xenstore-read', '/local/domain/'+domid+'/name']
> > +        print "Executing '%s'" % (" ".join(cmd), )
> > +        gname = check_output(cmd).splitlines()[0]
> > +        if gname != test: continue
> > +
> > +        cmd = ['xenstore-read', '/local/domain/'+domid+'/console/tty']
> > +        print "Executing '%s'" % (" ".join(cmd), )
> > +        con = check_output(cmd).splitlines()[0]
> > +        if con != '': break
> 
> Somewhere in this loop, you should poll the call to xl create.  In the
> case that there is an xl configuration error, this setup will wait for 5
> seconds, then discard all trace of the error.
> 

Right. I will see what I can do here.

> > +
> > +    # Tear down watch and timer
> > +    signal.alarm(0)
> > +    xs_watch.kill()
> >  
> >      cmd = ['xl', 'console', test]
> >      print "Executing '%s'" % (" ".join(cmd), )
> >      console = Popen(cmd, stdout = PIPE)
> > +    wait_list.append(console)
> >  
> >      cmd = ['xl', 'unpause', test]
> >      print "Executing '%s'" % (" ".join(cmd), )
> > @@ -327,12 +365,17 @@ def main():
> >      opts, args = parser.parse_args()
> >  
> >      if opts.list_tests:
> > -        return list_tests(args)
> > +        ret = list_tests(args)
> >      else:
> > -        return run_tests(args)
> > +        ret = run_tests(args)
> > +
> > +    for child in wait_list: child.wait()
> 
> On a newline please.
> 
> You should use stdout=PIPE, stderr=STDOUT (to link stdout and stderr to
> the same fd) and .communicate() to get the results.  In any case that a
> child exists non-zero, you mustn't discard the error.  As a result, I
> don't think you need DEVNULL any more.
> 

OK, this makes sense.

Wei.

> ~Andrew
> 
> > +
> > +    return ret
> >  
> >  
> >  if __name__ == "__main__":
> > +    signal.signal(signal.SIGALRM, sigalrm_handler)
> >      try:
> >          sys.exit(main())
> >      except RunnerError, e:
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.