[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Notes Design Session: Testing & CI Process and Workflow Improvements, x86/ARM/Embedded Testing, etc. - Does what we do today work?
Hi all, these are the notes from http://sched.co/AjHT There are ACTIONS on the following people: Lars. Ian/Julien, Rich/Christopher Also see http://markmail.org/message/7e2mdpimvrmsppq5, which duplicated some of the discussion we had here Regards Lars ARM Server Testing ================== Unreliable testing on ARM64/32 ------------------------------- * Ongoing problems with ARM64 box (hardware issue) ... ticket 92394 * Other ARM64 box (firmware issue) ... ticket 91727 These are stuck waiting for SoftIron We have escalated through a few channels and waiting for a response Order of new SoftIron boxes on hold. * ThunderX hardware ... ticket 91730 The ThunderX boxes are installed, but not yet commissioned Threads going on regarding netbook support (or lack thereof) * Failing Arndale boards in ARM crate We do not have a solution yet. For the general principle on how to deal with such cases, see the next section We had a discussion on how to move forward: a) Drop ARM32 testing (as Julien was not sure whether there is any demand for ARM32 based testing) b) Try and find a solution, in line with general principle for non-server form factor Hardware However, we agreed the following action ACTION: Ian to make a proposal to drop/downgrade ARM32 testing (make it non-blocking) and remove specific tests if there is no demand Embedded / Client Testing (ARM & x86) ===================================== General Principle ----------------- Fundamentally, the project OSSTEST and Test Lab maintainers do not have the expertise and bandwidth to be involved in evaluating development boards for stability and suitability. We also cannot have under-NDA hardware in the Test Lab. And we want to ensure that all non-server equipment is mounted on a sliding rack shelf, properly mounted and maintainable, such that we do not have outages due to loose equipment and or unplugged cables. This means, we can accept Embedded / Client under the following conditions: * We have enough space (Advisory Board approval needed) * Step 1: The proposer does some basic testing to ensure that the board is stable and suitable * Step 2: Investigate form factor issues We don't need a perfect server chassis, but we would really like everything in the rack to be properly mounted and maintainable. That means that shelves should be sliding shelves to which all the equipment, aka boards, power adaptor bricks, possibly fans if needed are fixed somehow (self-adhesive hooks, cable ties, etc.) and confidence that we won't have issues with height (spare height on the shelf when everything is mounted). * Step 3: Bench-test the final set-up We would expect either working HW sent to us (after some initial discussion) or a list of parts with clear instructions on how to assemble these. We would run these past Credativ, who manage the Test Lab for us. This would need to be a discussion with which Credativ and the maintainers are happy, but we would not drive it. Ongoing Work ------------ * Adding Renesas Car-X gen HW to the Test Lab ... ticket 91996 There has been some discussion around requirements. The goal is to get Renesas Car boards in a server chassis. There is AB approval. ACTION: Lars to chase EPAM based on ticket as we seem to have provided all necessary info (done) From EPAM: "Design is ready, we are about to start production as soon as we execute paperwork. Artem will send an update soon. Currently stalled on legal review and sign-off of paperwork with Renesas." * Intel NUCs in Test Lab Status: some initial discussion (stuck somewhere in step 2). Ball is in Christopher and Rich's court No board approval yet. Project may carry costs for chassis/fans (TBD) ACTION Christopher and Rich to follow up and drive Note this will be slow due to people being on vacation. There was also a discussion I was not part of on client specific test cases x86 Testing & Heisenbugs ======================== On the x86 side we had capacity issues (in hand) But in the 4.9.0 release we had problems with x86 Heisenbugs (https://en.wikipedia.org/wiki/Heisenbug) for some tests, which have created problems We had a bit of a discussion around this involving Ian and a few other people. In summary: * The push gate assumes that tests fail when someone introduces a bug, which should motivate contributors to fix their bugs * This breaks down with Heisenbugs: where the wrong people's contributions get stalled There is no simple solution: but we discussed options such as 1: Manually special case test cases with Heisenbugs (make them non-blocking) => underlying problems won't be fixed, but solved release bottlenecks 2: Add functionality to notice Heisenbugs and treat them differently (by re-running and doing log analysis) => this is a more sophisticated version of 1 Note that OpenStack has some ElasticSearch based tools to help with 2, but tests are manually made non-blocking if they fail intermittently 3: Release Manager to treat some Heisenbugs as release blockers (e.g. no push for 4 weeks, ...) Or other commit moratorium => punishes everyone in the hope that someone will step up and fix 4: Look at the possibility at some "bisection on steroids" (re-run tests 100+ times in the hope that the bug will be always triggered within a certain number of runs) => custom engineering needed, ring-fenced resources and may not work if the bug occurs in irregular time intervals There was no concrete outcome. There appear to be no good options. ACTION: Ian and Julien to look and reflect on these and either include in Release Manager checklist or make proposals to the list _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |