[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...

> On Jul 5, 2018, at 12:16 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
> session] Process changes: is the 6 monthly release Cadence too short, 
> Security Process, ..."):
>> We didn't look at the sporadic failing tests thoroughly enough. The
>> hypercall buffer failure has been there for ages, a newer kernel just
>> made it more probable. This would have saved us some weeks.
> In general, as a community, we are very bad at this kind of thing.
> In my experience, the development community is not really interested
> in fixing bugs which aren't directly in their way.
> You can observe this easily in the way that regression in Linux,
> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> days.  Linux mainline is broken too.
> We do not have a team of people reading these test reports, and
> chasing developers to fix them.  I certainly do not have time to do
> this triage.  On trees where osstest failures do not block
> development, things go unfixed for weeks, sometimes months.
> And overall my gut feeling is that tests which fail intermittently are
> usually blamed (even if this is not stated explicitly) on problems
> with osstest or with our test infrastructure.  It is easy for
> developers to think this because if they wait, the test will get
> "lucky", and pass, and so there will be a push and the developers can
> carry on.
> I have a vague plan to sit down and think about how osstest's
> results analysers could respond better to intermittent failures.  The
> If I can, I would like intermittent failures to block pushes.  That
> would at least help address the problem of heisenbugs (which are often
> actually quite serious issues) not beint taken seriously.
> I would love to hear suggestions for how to get people to actually fix
> test failures in trees not maintained by the Xen Project and therefore
> not gated by osstest.

Well at the moment, investigation is ad-hoc.  Basically everyone has to look to 
see *whether* there’s been a failure, and it’s nobody’s job in particular to 
try to chase it down to find out what it might be.  If we had a team, we could 
have a robot rotate between the teams to nominate one particular person per 
failure to take a look at the result and at least try to classify it, maybe try 
to find the appropriate person who may be able to take a deeper look.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.