[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] docs/process/xen-release-management: Lesson to learn



Julien Grall writes ("Re: [Xen-devel] [PATCH] 
docs/process/xen-release-management: Lesson to learn"):
> That's why testing with XSAs was requested on the security-ml before 
> hand. However, this was mistakenly done on rc7 rather than rc8.

If the proposed release had been constructed in advance, and agreed,
ahead of committing to a release, then there would have been an
opportunity for this mistake to be spotted.

We had no real plan for what to do if the tests showed problems that
we thought meant the proposed release was not suitable.  Indeed, some
of the testing identified a further issue which we have released with,
on the grounds that we are convinced it is not a regression.

IMO, that important bugfixes were still going in after we had
committed to a release date, is indicative of some kind of problem.
The purpose of the freeze is to allow bugs to be shaken out, and
also to allow us to hazard a guess the likely remaining bug density
by looking at the rate at which bugfixes are still going in.

If we commit to a release date when the rate of important bugfixes is
still nontrivial then we are likely to discover a new important bug
after we commit to releasing but before we release, and without
sufficient time for testing (and any necessary rework).  When that
happens we have few good options: we can release with a known
important bug; release with a not-fully-tested and perhaps
itself-buggy bugfix; or abort the release with all the consequences
for reputation and community engagement.


Regarding the testing of the wrong version.  Any task done by humans
involves a risk of errors.  These risks are magnified when: the work
must be done in secret; the kind of work is done fairly rarely; there
are more manual steps; there is new tooling; there is time pressure.

We have a number of opportunities for improvement.  One of them is to
improve our tooling - George's xsatool is part of that.  And another
is that we should have easier tools for setting up custom osstest
flights.  I have been thinking about how to do that.  But, the
introducing of new tooling itself carries a risk of course, as new
tools can have bugs and people can more easily misunderstand them.
Increased automation is not without its downsides, as it can hide what
is going on from the user.

Overall, we will not be able to eliminate human error in this kind of
activity.  We must either tolerate the consequences, or mitigate them.
Strategies for mitigation of human error include review or
double-checking (whether by automation or other humans), and
incorporating contingency time for rework.

IMO if we carry on as we have been doing it is only a matter of time
before we make a release which has some kind of serious and obvious
defect.


> As it seems, it is becoming more frequent to have XSAs around the 
> release. It is getting increasingly more difficult to make a choice on 
> the date.

We may need to formalise the process of releasing immediately after
public release of an XSA.

My proposed requirement to not commit to a release date until we know
what code we are planning to release is not incompatible with
privately preparing the release branch and testing it.

Also, it would be quite possible for the Xen Project Security Team to
adjust the disclosure schedule (which it suggests to vulnerability
discoverers, and which advice discoverers often take).  The Security
Policy says:
 | Our usual starting point for that negotiation, unless there are
 | reasons to diverge
A planned .0 release seems like a "reason ... to diverge" to me.


> This decision is not helped by the testing that have been quite 
> unreliable due to heisenbug.

The release process needs to cope with the actual level of quality in
our codebase, and in the test hardware we have available.

Obviously we have various work in progress to improve both of these
features but this is not easy.  For code quality there are competing
priorities that lead to under-investigation of heisenbugs.  I have
some ideas about how to have osstest push back more on heisenbugs, but
this will inevitably work by increasing the level of pain they cause
to the general flow of Xen development.

For hardware quality there is the competing priority to actually have
some testing of all the architectures we nominally support.

These kind issues seem to be me to be beyond the scope of the release
process.  As I say, the release process has to work with what we have,
not with what we would like to have.


> To be honest, if we had to follow your suggestion below. We would need 
> to get the tree completely frozen 2-3 weeks before the actual date.

That has been done before.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.