[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.17] x86/hvm: Revert per-domain APIC acceleration support


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Thu, 17 Nov 2022 21:10:39 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mMgjMMPXSq4VaDnBKjB8rA5B1BQ/8FfFZmXBNw1LA4I=; b=m155e77oY3f3pDlflOCBn7BCWF4J0CNQxeeKgEXV38DZ1o72RAS74lq+/UcdWiIVb31U3f7z0d8DSlRolf2qSeRwa8m2qNRoCo9VY4RSkXxT6mA+ZmrMWgRaJAkXE3uZSV7e1N4zmb8Y0HWbYttmN+DYlYMtWZryT85pveUGdAjAGyQZtyvUTS0c7K65sdu6Jfd0vItt+wTl7H0sBw2mpGplPNun4c8UpClDAGFuoFzohpHAVf+qI294UbSujGj4eqzBuhV48r37IvPdd8YUkYTGCIkAgO7UoH5yHQY9cQ2DslqprP+1z4gMfDbpm6NTv4uh36RpUf2tqM1tTZ3/JQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fYUaEoTHZMwJETwnMAFIsildNTOmNmBeTyvbahIx9fmXwCgFh9Yt3+GgeZCfpFdC0BiJV9BSewqPY6li7vhEpxQojhpjocy6cSOfKnB9d+IxdAiiT3PgfbkUfLzSxsWd8hxEugOjWo5Dzn221Wo6T8ClMKZ5S1kxheSAJoW6Q+T2+YTMZAorsrNYPOnlWbkNUt09Rl7dJOEv0im66udwKigWQbFtrSSwcEYq2KinFjxIlwXacHZ7Uduhhux1tN/3KGZ0YeOvCLbbWkL4nVb5VeIFoCyeBIytUglK4BthRca3BWYOPWDBrKERvngc6AdodW8CQKhReVEJqPcOZd9xXA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>
  • Delivery-date: Thu, 17 Nov 2022 21:10:56 +0000
  • Ironport-data: A9a23:ZLGAfKpfM3A0i6okvXxzdwafllJeBmImZBIvgKrLsJaIsI4StFCzt garIBnSbvvYYmD9eIt0bYi/oUNX68WBydQ3QQM/+CE2E3hA9ZuZCYyVIHmrMnLJJKUvbq7FA +Y2MYCccZ9uHhcwgj/3b9ANeFEljfngqoLUUbKCYWYpAFc+E0/NsDo788YhmIlknNOlNA2Ev NL2sqX3NUSsnjV5KQr40YrawP9UlKm06W1wUmAWP6gR5gaHziZNV/rzGInqR5fGatgMdgKFb 76rIIGRpgvx4xorA9W5pbf3GmVirmn6ZFXmZtJ+AsBOszAazsAA+v9T2Mk0MC+7vw6hjdFpo OihgLTrIesf0g8gr8xGO/VQO3kW0aSrY9YrK1Dn2SCY5xWun3cBX5yCpaz5VGEV0r8fPI1Ay RAXAGEwZSyTisaH/IiYbfU3ncQcKeLIH5xK7xmMzRmBZRonabbqZvyToPR/hXI3jM0IGuvCb c0EbzYpdA7HfxBEJlYQDtQ5gfusgX78NTZfrTp5p4JuuzSVkFM3jearaYWJEjCJbZw9ckKwj 2TK5WnmRDodM8SS02Gt+XOwnO7f2yj8Xer+EZXoq6Y60QLLnAT/DjUuSEaLp8eBtXeEUvhZd R0++3MfsqotoRnDot7VGkfQTGS/lg4RXZ9cHvM37CmJy7HI+ECJC24cVDlDZdc68sgsSlQC9 HWEgtfoDjxHq6CORDSW8bL8hSy2ETgYKykFfyBsZSsI5cPy5r46iB3nR8xmVqWyi7XI9SrYx jmLqG01gOwVhMtSjqGjpwmY33Sru4TDSRMz6kPPRGW54whlZYmjIYu19Vzc6vUGJ4GcJrWcg EU5dwGlxLhmJfmweOalG43hwJnBCy65DQDh
  • Ironport-hdrordr: A9a23:VQaoK64CUfQhH1NSQgPXweCCI+orL9Y04lQ7vn2ZFiY5TiXIra qTdaogviMc0AxhI03Jmbi7Scq9qeu1z+853WBjB8bZYOCAghrlEGgC1/qp/9SEIUHDH4FmpM BdmsRFaeEYSGIK9foSgzPIXOrIouP3lpxA7N22pxgCcegpUdAY0+4TMHf4LqQCfngjOXNPLu v42iMonVqdUEVSSv7+KmgOXuDFqdGOvJX6YSQeDxpixBiSgSiu4LvaFQHd+hsFSTtAzZor7G CAymXCl+SemsD+7iWZ+37Y7pxQltek4txfBPaUgsxQBiTwhh2ubIFBXaTHmDwuuumg5Hsjjd GJiRY9OMZY7W/XYwiO0FXQ8jil9Axrx27pyFeej3emi9f+XigGB81Igp8cWgfF6mI71esMk5 5j7ia8jd56HBnAlCPy65zjTBdxjHe5pnIkjKo6k2Ffa40Dc7VcxLZvvn+9Ua1wWR4S2rpXV9 WGP/usosq+tmnqNkwxi1MfhOBEmE5DRituDHJy4fB9mAIm4UyRh3FouPD32E1wtK7VAqM0md gteM5T5c5zZ95TYqRnCOgbR8yrTmTLXBLXKWqXZU/qDacdJhv22tfKCZgOlZaXkaYzve0PsY WEVEkduX85ekroB8HL1JpX8grVSGH4WTj20MlR65Vwp7W5HdPQQGa+YUFrl9Hlr+QUA8XdVf r2MJVKA+X7JW+rHYpSxQXxV5RbNHFbWswIvdQwXU6Iv6vwW8XXn/2edOyWKKvmED4iVG+6Cn wfXCLrLMEF9UyvUm+QummkZ5osQD2LwXtdKtmowwFI8vl9CmRliHlktX2poseWNDZFrqs6OE NjPbKPqNLImVWL
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY+Io3ygoaE+gKUU+QEYfs8WRTZ65AKPMAgAN4DoA=
  • Thread-topic: [PATCH for-4.17] x86/hvm: Revert per-domain APIC acceleration support

On 15/11/2022 16:12, Roger Pau Monne wrote:
> On Tue, Nov 15, 2022 at 12:35:39AM +0000, Andrew Cooper wrote:
>> I was really hoping to avoid this, but its now too late in the 4.17 freeze 
>> and
>> we still don't have working fixes.
> The fix I proposed still has some comments that have been unanswered,
> but at any rate thit is now far too late to try to get this fixed.
>
> FWIW that same fix was also posted to the security list way before
> hitting xen-devel.
>
>> The in-Xen calculations for assistance capabilities are buggy.  For the
>> avoidance of doubt, the original intention was to be able to control every
>> aspect of a APIC acceleration so we could comprehensively test Xen's support,
>> as it has proved to be buggy time and time again.
>>
>> Even after a protracted discussion on what the new API ought to mean, 
>> attempts
>> to apply it to the existing logic have been unsuccessful, proving that the
>> API/ABI is too complicated for most people to reason about.
> Are you referring to the VMX hardware interface to setup the related
> APIC assistance flags, or the hypervisor interface to control those
> features?
>
>> This reverts most of:
>>   2ce11ce249a3981bac50914c6a90f681ad7a4222
>>   6b2b9b3405092c3ad38d7342988a584b8efa674c
>>
>> leaving in place the non-APIC specific changes (minimal as they are).
>>
>> This takes us back to the behaviour of Xen 4.16 where APIC acceleration is
>> configured on a per system basis.
>>
>> This work will be revisited in due course.
> I certainly regret having been involved in attempting to fix this, and
> I have to admit I still don't understand what is broken with the
> current API/ABI.
>
> Do we want a flag to control the setting of the APIC register
> virtualization feature?
>
> Is the naming for the flag that we expose incorrect?
>
> Is the field where the flag gets set incorrect?
>
> There isn't that much to the current interface.
>
> In the previous reply to the fix however I got the (maybe incorrect)
> impression that current bugs in the implementation are used as a way
> to justify why the interface is broken, and that is not accurate.  The
> interface and the implementation are two different things, and bugs
> in the implementation shouldn't automatically invalidate the
> interface without further reasoning.

The fact you still have these questions demonstrates the point I'm
trying to make.

The answer is in the written description of how the interface is
intended to behave.  The fact there isn't one is the first problem.


This email has taken a long time to write.  I'm not trying to criticise;
this is really really complicated stuff, but I feel a change in
expectations is necessary.


There are definitely implementation issues.  We're all agreed here.

If the implementation issues could be fixed perfectly, then we are still
left with an interface which is incomplete (lacking a TPR control), and
imprecise.

"Just incomplete" would be entirely fine.  Work could continue in 4.18
to fill in the missing pieces.

But "accelerated x{,2}apic" is ambiguous; Note how much time was spent
by us trying to figure out what it ought to mean, and despite coming to
a tentative agreement, I have tried and failed to make a correct
correction to the implementation, and so have you.

We are the experts in this area and when we, who think we understand the
complexity, cannot get it correct, how on earth do you expect end users
to cope with the same interface?


Problems piling up in the way these have is a tell-tale sign that
there's a bigger issue at play.  Part of the responsibilities of being a
maintainer is knowing when to take a step back, re-evaluate things,
asking "is this really the right course of action?", and sometimes this
involves admitting failure, and trying to figure out how to minimise the
collateral damage.

In this case, I'm stating - using the public record of changes and
attempts in this area as justification - that the complexity is the
major problem, but it is being compounded by the ambiguity in the
interface.  I desperately did want to avoid a full reversion, but it is
the only responsible course of action at this juncture.


You may or may not agree with my reasoning here, but you have to concede
that there is a problem and that our current attempts to address it are
not working.

As far as next steps go, I have a plan and it involves doing nothing
until we have a new sphinx doc agreed upon and checked into the tree,
from which we can build a shared understanding of the complexities involved.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.