[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wg-openstack] Minutes of October 19 OpenStack WG Meeting

== Attendees == 
Lars Kurth
Bob Ball
Anthony Perard
Cedric Bosdonnat 
Jim Fehlik
Michael Glasgow
Joao M Martins (could not attend this time)

== Agenda ==
=== XenProject CI is no longer voting === 

On Oct 14th, Nova Core Reviewers have removed the Xen Project CI from the voting groups due to stability/of Xen Project CI's which appear to be caused by an increase of race conditions leading to a higher proportion of negative votes. See the following conversation
17:21 < superdan> johnthetubaguy: BobBall: seems like xen ci has been failopotomous lately
17:21 < johnthetubaguy> do you mean anthonyper's one?
17:22 < superdan> johnthetubaguy: "XenProject CI"
17:23 < johnthetubaguy> yeah, I think thats anthonyper who would know about that
17:23 < johnthetubaguy> there was a race they were trying to fix to improve things, I don't remember how far we are getting with that yet
17:23 < superdan> johnthetubaguy: okay it's just a lot of -1 noise right now it seems
17:24 < johnthetubaguy> we should probably stop it voting
17:25 < johnthetubaguy> BobBall: I removed the vote permissions for now, while that gets fixed up
This is frustrating timing as the CI has been very helpful by stopping a regression in https://review.openstack.org/#/c/334480/
I hope that, even now it’s not voting, that Markus will continue to track the XenProject’s report, but future regressions are less likely to be identified.

Anthony did some initial investigation and doesn't believe this has to do with 
We had a quick discussion of possible causes, which could be due to
- Using Xen on Xen within Rackspace. On XenServer, we have hit time-outs because we are running nested Virt as the inner Xen can't set up drivers. This in particular affects disk / network intensive tests
- It could be an issue with the baseline of Linux, Libvirt, Xen we are using. Antony's gut feeling is that this is maybe a Linux issue
- It could also be that something that changed in Nova

Jim: Don't have much context of what the issue is. Without more background it is hard to have a suggestion
Lars: there have been many changes 
ACTION: Anthony will send more information on his investigation to wg-openstack@ such that others can chip in. Information on which tests keep failing intermittently would help Bob identify whether this maybe is a Xen on Xen issue, as we have seen in the XenServer CI

** Possible Solution: Changing the baseline
Lars: could we try to change the baseline?
Michael: what is the downside of updating the baseline?
Lars: Given that we have stability issues, changing it may be worth a try. The main reason of not doing so, was to minimise risk of disruption and loosing voting status

Lars asks, what does Oracle uses in their internal CI?
Michael: same libvirt version as Xen project CI
Michal: OS level = Oracle Linux with Ubuntu containers, as Tempest is closely tied to Ubuntu
Michael: Oracle cloud folks are running Tempest Tests on Xen: primarily green. Not seeing the same issues as we are in the Xen Project CI
Lars believes that the main difference between the Oracle CI and the Xen Project is that the former runs on real Hardware, rather than on Xen on Xen 

ACTION: Michael to follow up with Alexey and report back to the thread, after Anthony sent out assessment of his findings

Lars asks, whether Suse has an internal CI and what they use?

Jim: The cloud team does. Not sure how much testing is done on Xen. Have a regular Ci for the whole product. Tempest is included in that. How many compute nodes are Xen / if any, not sure. We get bugs occasionally, so some tests must run on Xen

Cedric: Do not have many Xen tests. Most tests are KVM

ACTION: Cedric will follow up internally and report back

** How fast do we need to fix this?

Don't think there is a time limit. 
VMWare CI has been non-functional in Nova for 3-4 months. 
Occasionally it comments. People just ignore it. 
The big concern is that the longer the CI loop produces unreliable results, the less people will trust the results. 
Whenever there is an issue with the XenServer CI, Citrix tries to get it up as quickly as possible.

** Other Ideas: run Tempest as part of OSSTEST

I recall that we were planning to run Tempest as part of OSSTEST against the latest Xen and new libvirt and Linux baselines
The advantage is that 
- We do not need to run Xen on Xen with nested virt and exclude a possible route for errors
- We would get an extra data point
- We would get early warning on Linux and Xen related issues
The disadvantage is that 
- It does not test Nova changes
- We can't use our test COLO for the Xen Project CI

Lars: What is the status of these?
Anthony: Posted a patch on xen-devel, but it didn't go in due to lack of comments. I can refresh it and repost?
Lars: This does only make sense if we do actually get review comments and the change goes in. Can you CC this list on the series?
Anthony: Yes
Lars: Also recalls that Marcos Matsunaga from Oracle was planning to use OSSTEST for misc purposes. He may be a good reviewer apart from Ian Jackson. Not sure whether Suse can help, but we ought to communicate the series to the wg-openstack@ list

ACTION: Lars to follow up with Konrad and see whether this is doable (aside: Konrad also reviewed some OSSTEST changes recently).
ACTION: Michael/Alexey to confirm whether Oracle Tempest is running on real HW

=== CI Loop Funding (Lars) ===
The cost of continuing to run the CI loop is increasing from $12.5K per year to around $16K
I already fed this into the Advisory Board budget planning round 
I don't expect this to be an issue, but loosing voting rights may raise questions why the project is spending money on the CI Loop

=== Meeting Invites ===
Had issues with meeting invites : deleted old invite and created new one
Lars: cannot host Dec 14th meeting as travelling, but can ask for agenda items the week before and see whether there are topics to discuss. Can also provide details to start call.

Wg-openstack mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.