[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...

On Thu, Jul 05, 2018 at 11:39:51AM +0000, George Dunlap wrote:
> > On Jul 5, 2018, at 12:16 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
> > 
> > Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
> > session] Process changes: is the 6 monthly release Cadence too short, 
> > Security Process, ..."):
> >> We didn't look at the sporadic failing tests thoroughly enough. The
> >> hypercall buffer failure has been there for ages, a newer kernel just
> >> made it more probable. This would have saved us some weeks.
> > 
> > In general, as a community, we are very bad at this kind of thing.
> > 
> > In my experience, the development community is not really interested
> > in fixing bugs which aren't directly in their way.
> > 
> > You can observe this easily in the way that regression in Linux,
> > spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> > days.  Linux mainline is broken too.
> > 
> > We do not have a team of people reading these test reports, and
> > chasing developers to fix them.  I certainly do not have time to do
> > this triage.  On trees where osstest failures do not block
> > development, things go unfixed for weeks, sometimes months.
> > 
> > And overall my gut feeling is that tests which fail intermittently are
> > usually blamed (even if this is not stated explicitly) on problems
> > with osstest or with our test infrastructure.  It is easy for
> > developers to think this because if they wait, the test will get
> > "lucky", and pass, and so there will be a push and the developers can
> > carry on.
> > 
> > I have a vague plan to sit down and think about how osstest's
> > results analysers could respond better to intermittent failures.  The
> > If I can, I would like intermittent failures to block pushes.  That
> > would at least help address the problem of heisenbugs (which are often
> > actually quite serious issues) not beint taken seriously.
> > 
> > I would love to hear suggestions for how to get people to actually fix
> > test failures in trees not maintained by the Xen Project and therefore
> > not gated by osstest.
> Well at the moment, investigation is ad-hoc.  Basically everyone has to look 
> to see *whether* there’s been a failure, and it’s nobody’s job in particular 
> to try to chase it down to find out what it might be.  If we had a team, we 
> could have a robot rotate between the teams to nominate one particular person 
> per failure to take a look at the result and at least try to classify it, 
> maybe try to find the appropriate person who may be able to take a deeper 
> look.
>  -George

I forget the saying exactly and forgot who said it but it goes something
like "Any task that is the job of everyone is the job of no one and will
not get done."


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.