[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wg-test-framework] Minutes of monthly Xen+Libvirt OpenStack meeting (Match 25th)



= Attendees =
Lars Kurth (LK)
Anthony Perard (AP)
Jim Fehlig (JF)
Bob Ball (BB)
Konrad Wilk (KW)
Antony Messerli (AM)

= Status Update =

BB gave a quick status update and and showed  http://jenkins.openstack.xenproject.org
The key overview page is at http://jenkins.openstack.xenproject.org/job/dsvm-tempest-xen/buildTimeTrend (screen shot attached)
We have had about 50% red (failed) and 50% green (failed). We discovered two NOVA bugs for which patches have been submitted, but these have not yet gone into KILO.
* https://review.openstack.org/#/c/159106/ (submitted by Daniel Berrange) : only one approval and one is missing (Daniel can't approve) - this fix is marked for Kilo RC1
* https://review.openstack.org/#/c/166184/ (submitted by AP) : unless we can get the priority pushed up by NOVA core devs and get the reviews, this one is unlikely to go in

BB: it is unlikely that we get any more NOVA changes â if we discover them say in the next week or so in â as KILO is in release candidate mode (having passed the KILO3 milestone)
and master is increasingly tied down. On the plus side, this should help reduce churn and 

AM: offered that he can look into getting the priority of 166184 pushed up and get a Rackspace employee to review the patches such that they can be applied
ACTION: Ant Messerly (AM) to investigate the above ^^  

BB: is also on vacation in the next two weeks, so can't do reviews or raise patches, or do any other work

On the plus side, the churn in NOVA should reduce which means we should have a more stable baseline to investigate issues.

== Scripts to investigate issues ==

BB: A simple couple of bash lines to download the full console logs from the jobs and create a histogram of  tempest tests caused the majority of the failures:


# for i in {1106..1374}; do wget http://104.239.148.40:8080/job/dsvm-tempest-xen/${i}/consoleFull -O ${i}_console; done

# grep -h "\.\.\. FAIL" * | sed -e 's/.*\(tempest[^) ]*\).*/\1/' | sort | uniq -c | sort -n


From the logs (attached) the current prime issue is test_resize_server_confirm_from_stopped (excerpt of log below)


# Failures Test

[snip]

     24 tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks

     25 tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments

     25 tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_cryptsetup

     58 tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped


This should be fixed by 166184. So the priority needs to be to get these changes into NOVA.


= Analysis and finding a way forward = 

BB: Need to stepwise reduce the tests that are failing. In theory we could write a quick parsing script (attached above) which looks at all of the tests 
and will help us identify hot spots of test failures. This may give us some more clarity and figure out where we have test failure hotspots.

*Unfortunately it turns out that this is not the case (see attached log histogram)*

AP discovered one race condition in NOVA, for which we have a patch now (aka 166184). He is working through issues, but there are a lot of moving pieces that can cause failures.

LK :asks what Xen + Libvirt baseline we are using

BB: at the moment we have a custom based system which uses Xen 4.4.2 (with some of JF's patches) + Ubuntu Dom0 + an older libvirt version with some patches.
BB: we were also queuing up Daniel B's fix (aka 159106). We can easily add openstack changes that are necessary into the custom config.

LK wonders whether the randomness of failures are an indicator that we are suffering from the concurrency issues that JF is investigating

JF: has been looking at running Tempest in parallel. Whether the intermittent failures would be improved is not clear.

BB: Even within serial Tempest, multiple VMs are run and thus it is plausible that the intermittend failures could be caused by this.

JF: if VM's are started/destroyed concurrently we could be hitting the issues that have been fixed.
JF: I've made some progress on the concurrency problems Tempest (and my reproducer) bump into.

JF: sent mail with additional patches for Xen 4.4 & 4.5 and Libvirt (yesterday several patches) were committed.

[Copy of text from JF's e-mail sent prior to the meeting

On the Xen side, 5 patches are needed on top of current 4.4.x and
4.5.x.  Commits 93699882d and f1335f0d from master and 4783c99a,
1c91d6fba, and 188e9c54 from staging.

A lot of work has been done on the libvirt side, much of which has been
committed. I do have one last series of three patches fixing issues
related to domain destroy that are not yet posted upstream.  With those
patches on top of current libvirt.git master, and a libxl containing
aforementioned Xen patches, my Tempest reproducer passes.
]

LK: asks how easy it is to change the Xen and Libvirt baseline

BB: Not particularly straight forward, unless we can upgrade for all of the tests. 
BB: We can't run a single test with a custom version with libvirt or Xen. Would have change for all tests.

LK: given the intermittent failures changing the Xen + Libvirt baseline would actually

AP: Testing a specific version of Xen 4.4.1 + additional patches and Libvirt plus patches
AP: Tried JF's new patches on his local set-up in parallel, and still gets many time-outs. 

LK: It appears that moving the baseline for Xen and Libvirt and getting the NOVA changes in are our top next steps

*ALL AGREED* 

Bob: XenServer had an internal CI loop for quite some time before going live and achieving a good level of pass rates.

Bob: would be happy to move the baseline

ACTION: JF & AP to decide what the best baseline would be

LK: asks what test environment JF uses

JF: doesn't run Tempest, runs his own Tempest reproducer
JF: the Xen version does not appear to matter so much as long as the 5 patches are in place 
     (4.4.2 + 4.5 + unstable with those 5 patches listed above). All of these work equally well with reproducer scripts.
JF: Will ask Xen maintainers to back port these such that the next Xen releases have the fixes in them

JF: Libvirt needs to use something very current. The latest release is not sufficient. It needs to be git master. 
JF: points out that libvirt 1.4.14 (released in 2 weeks) will contain all the recent libvirt work
JF: will be posting the 3 missing pieces in the next few hours. But someone needs to be reviewing these before Friday 
JF: Libvirt FREEZE is on Friday. 
JF: re-iterates that  libvirt 1.4.14 should give us a good baseline for Tempest

LK: asks whether we can use Xen Project members to review the 3 outstanding libvirt patches for libvirt 1.4.14  
*AP and KW offer to review the libvirt patches*

ACTION: JF will CC xen-devel, AP & KW on these 
ACTION: KW and AP to look out for these patches on xen-devel
ACTION: LK will sync with Ian Jackson (who has reviewed Libvirt patches before) to see whether he can have a look also

BB: is fairly confident that the concurrency issues will improve the situation enough, such that we can turn commenting 
on on all changes for the OpenStack Liberty development cyle

AM: noted that he has also been trying Fedora (vs. Ubuntu)

ACTION: BB to sync with AP and JF on baseline when back from holidays

= Estimating of cost of running  *.openstack.xenproject.org =

LK: one of the outstanding actions was to estimate the cost of running Tempest, such that the Xen Project Advisory Board
budget can be revised to be more accurate

BB: notes that we should not do this until we can run Tempest in parallel as in sequential mode we use more Vms than
Really needed

AM: notes that we can pull the bill from the Citrix account; it itemises all the VMs which we then should be able to map back
to *.openstack.xenproject.org

ACTION: BB to investigate ^^ after back from vacation â sync with LK before

Attachment: libvirt_xen_failures[1].txt
Description: libvirt_xen_failures[1].txt

_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.