[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Notes Design Session: Testing & CI Process and Workflow Improvements, x86/ARM/Embedded Testing, etc. - Does what we do today work?



Hi all,

these are the notes from 

http://sched.co/AjHT

There are ACTIONS on the following people: Lars. Ian/Julien, Rich/Christopher 

Also see http://markmail.org/message/7e2mdpimvrmsppq5, which duplicated some of 
the discussion we had here 

Regards
Lars

ARM Server Testing
==================

Unreliable testing on ARM64/32 
-------------------------------

* Ongoing problems with ARM64 box (hardware issue) ... ticket 92394
* Other ARM64 box (firmware issue) ... ticket 91727
These are stuck waiting for SoftIron
We have escalated through a few channels and waiting for a response
Order of new SoftIron boxes on hold.

* ThunderX hardware ... ticket 91730
The ThunderX boxes are installed, but not yet commissioned
Threads going on regarding netbook support (or lack thereof) 

* Failing Arndale boards in ARM crate
We do not have a solution yet. 
For the general principle on how to deal with such cases, see the next section

We had a discussion on how to move forward:
a) Drop ARM32 testing (as Julien was not sure whether there is any demand for 
ARM32 based testing)
b) Try and find a solution, in line with general principle for non-server form 
factor Hardware

However, we agreed the following action

ACTION: Ian to make a proposal to drop/downgrade ARM32 testing (make it 
non-blocking) and remove specific tests if there is no demand


Embedded / Client Testing (ARM & x86)
=====================================

General Principle
-----------------

Fundamentally, the project OSSTEST and Test Lab maintainers do not have the 
expertise and bandwidth to be involved in evaluating development boards for 
stability and suitability. We also cannot have under-NDA hardware in the Test 
Lab. And we want to ensure that all non-server equipment is mounted on a 
sliding rack shelf, properly mounted and maintainable, such that we do not have 
outages due to loose equipment and or unplugged cables.

This means, we can accept Embedded / Client under the following conditions:

* We have enough space (Advisory Board approval needed)

* Step 1: The proposer does some basic testing to ensure that the board is 
stable and suitable

* Step 2: Investigate form factor issues
We don't need a perfect server chassis, but we would really like everything in 
the rack to be properly mounted and maintainable.

That means that shelves should be sliding shelves to which all the equipment, 
aka boards, power adaptor bricks, possibly fans if needed are fixed somehow 
(self-adhesive hooks, cable ties, etc.) and confidence that we won't have 
issues with height (spare height on the shelf when everything is mounted).

* Step 3: Bench-test the final set-up 

We would expect either working HW sent to us (after some initial discussion) or 
a list of parts with clear instructions on how to assemble these. We would run 
these past Credativ, who manage the Test Lab for us. This would need to be a 
discussion with which Credativ and the maintainers are happy, but we would not 
drive it.

Ongoing Work
------------

* Adding Renesas Car-X gen HW to the Test Lab ... ticket 91996
There has been some discussion around requirements. The goal is to get Renesas 
Car boards in a server chassis.
There is AB approval.

ACTION: Lars to chase EPAM based on ticket as we seem to have provided all 
necessary info (done)

From EPAM: 
"Design is ready, we are about to start production as soon as we execute 
paperwork.
Artem will send an update soon. Currently stalled on legal review and sign-off 
of
paperwork with Renesas."

* Intel NUCs in Test Lab
Status: some initial discussion (stuck somewhere in step 2). Ball is in 
Christopher and Rich's court
No board approval yet. Project may carry costs for chassis/fans (TBD)

ACTION Christopher and Rich to follow up and drive 

Note this will be slow due to people being on vacation. There was also a 
discussion I was not part of on client specific test cases


x86 Testing & Heisenbugs
========================

On the x86 side we had capacity issues (in hand)
But in the 4.9.0 release we had problems with x86 Heisenbugs 
(https://en.wikipedia.org/wiki/Heisenbug) for some tests, which have created 
problems

We had a bit of a discussion around this involving Ian and a few other people. 

In summary:
* The push gate assumes that tests fail when someone introduces a bug, which 
should motivate contributors to fix their bugs
* This breaks down with Heisenbugs: where the wrong people's contributions get 
stalled

There is no simple solution: but we discussed options such as

1: Manually special case test cases with Heisenbugs (make them non-blocking) 
=> underlying problems won't be fixed, but solved release bottlenecks

2: Add functionality to notice Heisenbugs and treat them differently (by 
re-running and doing log analysis) 
=> this is a more sophisticated version of 1
Note that OpenStack has some ElasticSearch based tools to help with 2, but 
tests are manually made non-blocking if they fail intermittently

3: Release Manager to treat some Heisenbugs as release blockers (e.g. no push 
for 4 weeks, ...) 
   Or other commit moratorium
=> punishes everyone in the hope that someone will step up and fix

4: Look at the possibility at some "bisection on steroids" (re-run tests 100+ 
times in the hope that the bug will be always triggered within a certain number 
of runs) 
=> custom engineering needed, ring-fenced resources and may not work if the bug 
occurs in irregular time intervals

There was no concrete outcome.
There appear to be no good options.

ACTION: Ian and Julien to look and reflect on these and either include in 
Release Manager checklist or make proposals to the list





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.