[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17] x86/hvm: Revert per-domain APIC acceleration support
On 15/11/2022 16:12, Roger Pau Monne wrote: > On Tue, Nov 15, 2022 at 12:35:39AM +0000, Andrew Cooper wrote: >> I was really hoping to avoid this, but its now too late in the 4.17 freeze >> and >> we still don't have working fixes. > The fix I proposed still has some comments that have been unanswered, > but at any rate thit is now far too late to try to get this fixed. > > FWIW that same fix was also posted to the security list way before > hitting xen-devel. > >> The in-Xen calculations for assistance capabilities are buggy. For the >> avoidance of doubt, the original intention was to be able to control every >> aspect of a APIC acceleration so we could comprehensively test Xen's support, >> as it has proved to be buggy time and time again. >> >> Even after a protracted discussion on what the new API ought to mean, >> attempts >> to apply it to the existing logic have been unsuccessful, proving that the >> API/ABI is too complicated for most people to reason about. > Are you referring to the VMX hardware interface to setup the related > APIC assistance flags, or the hypervisor interface to control those > features? > >> This reverts most of: >> 2ce11ce249a3981bac50914c6a90f681ad7a4222 >> 6b2b9b3405092c3ad38d7342988a584b8efa674c >> >> leaving in place the non-APIC specific changes (minimal as they are). >> >> This takes us back to the behaviour of Xen 4.16 where APIC acceleration is >> configured on a per system basis. >> >> This work will be revisited in due course. > I certainly regret having been involved in attempting to fix this, and > I have to admit I still don't understand what is broken with the > current API/ABI. > > Do we want a flag to control the setting of the APIC register > virtualization feature? > > Is the naming for the flag that we expose incorrect? > > Is the field where the flag gets set incorrect? > > There isn't that much to the current interface. > > In the previous reply to the fix however I got the (maybe incorrect) > impression that current bugs in the implementation are used as a way > to justify why the interface is broken, and that is not accurate. The > interface and the implementation are two different things, and bugs > in the implementation shouldn't automatically invalidate the > interface without further reasoning. The fact you still have these questions demonstrates the point I'm trying to make. The answer is in the written description of how the interface is intended to behave. The fact there isn't one is the first problem. This email has taken a long time to write. I'm not trying to criticise; this is really really complicated stuff, but I feel a change in expectations is necessary. There are definitely implementation issues. We're all agreed here. If the implementation issues could be fixed perfectly, then we are still left with an interface which is incomplete (lacking a TPR control), and imprecise. "Just incomplete" would be entirely fine. Work could continue in 4.18 to fill in the missing pieces. But "accelerated x{,2}apic" is ambiguous; Note how much time was spent by us trying to figure out what it ought to mean, and despite coming to a tentative agreement, I have tried and failed to make a correct correction to the implementation, and so have you. We are the experts in this area and when we, who think we understand the complexity, cannot get it correct, how on earth do you expect end users to cope with the same interface? Problems piling up in the way these have is a tell-tale sign that there's a bigger issue at play. Part of the responsibilities of being a maintainer is knowing when to take a step back, re-evaluate things, asking "is this really the right course of action?", and sometimes this involves admitting failure, and trying to figure out how to minimise the collateral damage. In this case, I'm stating - using the public record of changes and attempts in this area as justification - that the complexity is the major problem, but it is being compounded by the ambiguity in the interface. I desperately did want to avoid a full reversion, but it is the only responsible course of action at this juncture. You may or may not agree with my reasoning here, but you have to concede that there is a problem and that our current attempts to address it are not working. As far as next steps go, I have a plan and it involves doing nothing until we have a new sphinx doc agreed upon and checked into the tree, from which we can build a shared understanding of the complexities involved. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |