Xen project Mailing List

Re: [Xen-devel] [PATCH] docs/process/xen-release-management: Lesson to learn

To: Julien Grall <julien.grall@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

Date: Wed, 13 Dec 2017 13:36:06 +0000

Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Lars Kurth <lars.kurth@xxxxxxxxxx>

Delivery-date: Wed, 13 Dec 2017 13:36:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Julien Grall writes ("Re: [Xen-devel] [PATCH] docs/process/xen-release-management: Lesson to learn"): > That's why testing with XSAs was requested on the security-ml before > hand. However, this was mistakenly done on rc7 rather than rc8. If the proposed release had been constructed in advance, and agreed, ahead of committing to a release, then there would have been an opportunity for this mistake to be spotted. We had no real plan for what to do if the tests showed problems that we thought meant the proposed release was not suitable. Indeed, some of the testing identified a further issue which we have released with, on the grounds that we are convinced it is not a regression. IMO, that important bugfixes were still going in after we had committed to a release date, is indicative of some kind of problem. The purpose of the freeze is to allow bugs to be shaken out, and also to allow us to hazard a guess the likely remaining bug density by looking at the rate at which bugfixes are still going in. If we commit to a release date when the rate of important bugfixes is still nontrivial then we are likely to discover a new important bug after we commit to releasing but before we release, and without sufficient time for testing (and any necessary rework). When that happens we have few good options: we can release with a known important bug; release with a not-fully-tested and perhaps itself-buggy bugfix; or abort the release with all the consequences for reputation and community engagement. Regarding the testing of the wrong version. Any task done by humans involves a risk of errors. These risks are magnified when: the work must be done in secret; the kind of work is done fairly rarely; there are more manual steps; there is new tooling; there is time pressure. We have a number of opportunities for improvement. One of them is to improve our tooling - George's xsatool is part of that. And another is that we should have easier tools for setting up custom osstest flights. I have been thinking about how to do that. But, the introducing of new tooling itself carries a risk of course, as new tools can have bugs and people can more easily misunderstand them. Increased automation is not without its downsides, as it can hide what is going on from the user. Overall, we will not be able to eliminate human error in this kind of activity. We must either tolerate the consequences, or mitigate them. Strategies for mitigation of human error include review or double-checking (whether by automation or other humans), and incorporating contingency time for rework. IMO if we carry on as we have been doing it is only a matter of time before we make a release which has some kind of serious and obvious defect. > As it seems, it is becoming more frequent to have XSAs around the > release. It is getting increasingly more difficult to make a choice on > the date. We may need to formalise the process of releasing immediately after public release of an XSA. My proposed requirement to not commit to a release date until we know what code we are planning to release is not incompatible with privately preparing the release branch and testing it. Also, it would be quite possible for the Xen Project Security Team to adjust the disclosure schedule (which it suggests to vulnerability discoverers, and which advice discoverers often take). The Security Policy says: | Our usual starting point for that negotiation, unless there are | reasons to diverge A planned .0 release seems like a "reason ... to diverge" to me. > This decision is not helped by the testing that have been quite > unreliable due to heisenbug. The release process needs to cope with the actual level of quality in our codebase, and in the test hardware we have available. Obviously we have various work in progress to improve both of these features but this is not easy. For code quality there are competing priorities that lead to under-investigation of heisenbugs. I have some ideas about how to have osstest push back more on heisenbugs, but this will inevitably work by increasing the level of pain they cause to the general flow of Xen development. For hardware quality there is the competing priority to actually have some testing of all the architectures we nominally support. These kind issues seem to be me to be beyond the scope of the release process. As I say, the release process has to work with what we have, not with what we would like to have. > To be honest, if we had to follow your suggestion below. We would need > to get the tree completely frozen 2-3 weeks before the actual date. That has been done before. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.