[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-4.9-testing test] 126201: regressions - FAIL
On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote: > On 08/21/2018 05:14 AM, Jan Beulich wrote: > > > > > On 21.08.18 at 03:11, <osstest-admin@xxxxxxxxxxxxxx> wrote: > > > flight 126201 xen-4.9-testing real [real] > > > http://logs.test-lab.xenproject.org/osstest/logs/126201/ > > > > > > Regressions :-( > > > > > > Tests which did not succeed and are blocking, > > > including tests which could not be run: > > > test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail > > > REGR. vs. 124328 > > > > Something needs to be done about this, as this continued failure is > > blocking the 4.9.3 release. I did mail about this on Aug 2nd already > > for flight 125710, I've got back from Wei: > > > > > This is libvirtd's error message. > > > > > > The remote host can't obtain the state change log due to it is already > > > held by another task/thread. It could be a libvirt / libxl bug. > > > > > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 : > > > Cannot start job (modify) for domain debian.guest.osstest; current job is > > > (modify) owned by (24975) > > I took a closer look at the logs and it appears the finish phase of > migration fails to acquire the domain job lock since it is already held by > the perform phase. In the perform phase, after the vm has been transferred > to the dst, the qemu process associated with the vm is started. For whatever > reason that takes a long time on this host: > > 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm: > Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with > arguments: ... > 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event: > domain 1 device model: spawn watch p=(null) This is a spurious event after the watch has been set up. > ... > 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback: > watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: > event epath=/local/domain/0/device-model/1/state > 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event: > domain 1 device model: spawn watch p=running So it has taken 32s for QEMU to write "running" in xenstore. This, however, is still within the timeout limit set by libxl (60s). > > In the meantime we move to the finish phase and timeout waiting for the > above perform phase to complete > > 2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492 > (virNetServerHandleJob) is now running job > remoteDispatchDomainMigrateFinish3Params > ... > 2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 : > Cannot start job (modify) for domain debian.guest.osstest; current job is > (modify) owned by (24982) > 2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : > Timed out during operation: cannot acquire state change lock > > What could be causing the long startup time of qemu on these hosts? Does > dom0 have enough cpu/memory? As you noticed, the libvirt commit used for > this test has not changed in a long time, well before the failures appeared. > Perhaps a subtle change in libxl is exposing the bug? There have only been two changes to libxl in the range of changesets being tested. c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config 5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks They wouldn't change how libxl interact with libvirt. QEMU tag didn't change. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |