[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [libvirt test] 55257: regressions - FAIL

On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote:
> On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote:
> > More hint that libvirtd crashed.  Have there been any attempts to
> > reproduce this outside of the test rig?  Or capture a core dump?
> Here are two from the OpenStack CI loop:
> http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68
> http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0
> in logs/libvirt/libxl/libxl-driver.txt.gz, you will find:
> libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: 
> startup timed out
> libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device 
> model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model 
> did not start: -3
> Weird, it's the same domain number for both logs :).
> Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz,
> which is the service that talk to libvirtd.
> It's running libvirt 1.2.14 with:
>     f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
>     894d2ff libxl: acquire a job when destroying a domain
>     6dfec1e libxl: drop virDomainObj lock when destroying a domain
> and xen 4.4.1 with:
>     9369988 libxl: event handling: Break out ao_work_outstanding
>     f1335f0 libxl: event handling: ao_inprogress does waits while reports 
> outstanding
>     4783c99 libxl: In domain death search, start search at first domid we want
>     188e9c5 libxl: Domain destroy: fork
> http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline


We didn't used to see these issues, but there has been a rather large
gap where we didn't get useful results due to upheaval from the colo
move and there were other issues (e.g. the crashing issue) which make it
hard to pinpoint a point in time where this didn't happen.

Did you have a previous baseline which didn't exhibit these problems? Or
did it exhibit enough other problems not to be usable?

If we can find some plausible sounding baseline to try (i.e. commit id,
not a commit id + patch queue) then I could try and run some adhoc tests
to establish a baseline.

Perhaps I should try xen.git#stable-4.5 and libvirt.git#1.2.14 in the
first instance? Or I could pick a xen-unstable flight pass from, say,
Easter-ish and try with that?

This seems to be an intermittent bug, so it's not clear that the
bisector is going to be all that useful. However we do do multiple
domain starts now so perhaps the chances of sneaking past are reduced.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.