[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...



Lars Kurth writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] 
Process changes: is the 6 monthly release Cadence too short, Security Process, 
..."):
> * The Gitlab machinery will do build tests (and the discussion
>   showed that we should be able to do this via cross compilation or
>   compilation on a real system if a service such as infosifter is
>   used [...]
> * This could eventually include a basic set of smoke tests that are
>   system independent and could run under QEMU - Doug already uses a
>   basic test where a xen host and/or VM is started

Firstly, I think this is all an excellent idea.  It should be pursued.

I don't think it interacts directly with osstest except to reduce the
rate of test failures.


> [Juergen:]
>     A major source of the pain seems to be the hardware: About half of all
>     cases where I looked into the test reports to find the reason for a
>     failing test flight were related to hardware failures. Not sure how to
>     solve that.
>     
> This is worrying and I would like to get Ian Jackson's viewpoint on it. 

I haven't worked up a formal analysis of the pattern of failures, but
(discussing hardware trouble only):

 * We are still waiting for new ARM hardware.  When we get it we will
   hopefully be able to decommission the arndale dev boards, whose
   network controllers are unreliable.

   Sadly, the unreliability of the armhf tests has become so
   normalised that we all just shrug and hope the next one will be
   better.  Another option would be to decommission the arndales right
   away and reduce the armhf test coverage.

 * We have had problems with PDU relays, affecting about three
   machines (depending how you count things).  My experience with the
   PDUs in Massachusetts has been much poorer than in Cambridge.  I
   think the underlying cause is probably USAian 110V electricity (!)
   I have a plan to fix this, involving more use of IPMI in tandem
   with the PDUs, which I hope will reduce this significantly.

 * As the test lab increases in size, the rate of hardware failure
   necessarily also rises.  Right now, response to that is manual: a
   human must notice the problem, inspect test results, decide it is a
   hardware problem, and take the affected node out of service.  I
   am working on a plan to do that part automatically.

   Human intervention will still be required to diagnose and repair
   the problem of course, but in the meantime, further tests will not
   be affected.

>     Another potential problems showed up last week: OSSTEST is using the
>     Debian servers for doing the basic installation. A change there (e.g.
>     a new point release) will block tests. I'd prefer to have a local cache
>     of the last known good set of *.deb files to be used especially for the
>     branched Xen versions. This would rule out remote problems for releases.
> 
> This is again something which we should definitely look at.

This was bad luck.  This kind of update happens about 3-4 times a
year.  It does break everything, leading to a delay of a day or two,
but the fix is straightforward.

Obviously this is not ideal but the solutions are nontrivial.  It is
not really possible to "have a local cache of the last known good set
of *.deb files" without knowing what that subset should be; that would
require an edifice to track what is used, or some manual configuration
which would probably break.  Alternatively we could run a complete
mirror but that is a *lot* of space and bandwidth, most of which would
be unused.

I think the right approach is probably to switch from using d-i for
host installs, to something like FAI.  That would be faster as well.
However that amouns to reengineering the way osstest does host
installs; it would also leave us maintaining an additional way to do
host installs, since we would still want to be able to *test* d-i
operation as a guest.

So overall I have left this one on the back burner.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.