[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 4/4] x86/cpu-policy: Derive RSBA/RRSBA for guest policies


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 16 Jun 2023 14:12:29 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UdFKnM2x4eXMR1SRVIZtGdeSEii6gRJcWrcAs1rYC58=; b=mHk2iaZfFaDlumAMA00hrGezDZh3Z58oUc3xXc++if3ZcEDCarNIYPtu1TxuGwaZbIsDGUmaE6HF23vbjlKzixwoAUoN99OmUmT29aJRuqF4n6Ub2ccXuKZKoUuIne065d4t+96dwX1c59IYMX5oNVHeEK/JBQjl38fdJ8bN7vY7gbVZNG4kD9+sHRwpvrbfAFXODqQ097nYbUVyUtXkrziUc03juFmgcK7g+iE22qLGUSaXCtyOT5WDB/wnuSgD+ah3F0mGTFTddN9IFjPbPRThqgclwRu8ENKo+QPBYsaWIUGMXX1GsaqBCIuZ/ezyqWwLN3jkvcWyIqTrj9JB+w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XXkiUUWkfI9szS45OniiiQOoVTwvhp9tgC/rDKGbCqwLWKIfxssFFYZablJYdbiZ/vDxNd3VaFlP+7TXZFln3q3D/rG/rjDZ46hWIWpW2idtpF+0j+5cHOHgOlBxTin122lv4S4sKhBmTEIZSLs/CRQ9fm3DIbars226YAPvuo5Q0+VZ6CoTLNy96dEj6VhGDAhJwzC+5f+5ctDSnnCXhzXG4Q4W5DMOtREUYVHZEf4mZIY53Qz2ccQTsEX/oaE9LYQV53sC5mkT1DabJYDNLd1mkSDs3OmtYuhlbNKxt8AuXI7E79UXjEEbckK0ZRcVxeT7tosufyQ+ysetFiIWEg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 16 Jun 2023 12:12:54 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.06.2023 20:17, Andrew Cooper wrote:
> On 15/06/2023 1:13 pm, Jan Beulich wrote:
>> On 15.06.2023 12:41, Andrew Cooper wrote:
>>> On 15/06/2023 9:30 am, Jan Beulich wrote:
>>>> On 14.06.2023 20:12, Andrew Cooper wrote:
>>>>> On 13/06/2023 10:59 am, Jan Beulich wrote:
>>>>>> On 12.06.2023 18:13, Andrew Cooper wrote:
>>>>>>> The RSBA bit, "RSB Alternative", means that the RSB may use alternative
>>>>>>> predictors when empty.  From a practical point of view, this mean 
>>>>>>> "Retpoline
>>>>>>> not safe".
>>>>>>>
>>>>>>> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously 
>>>>>>> IBRS_ATT) is a
>>>>>>> statement that IBRS is implemented in hardware (as opposed to the form
>>>>>>> retrofitted to existing CPUs in microcode).
>>>>>>>
>>>>>>> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the 
>>>>>>> eIBRS
>>>>>>> property that predictions are tagged with the mode in which they were 
>>>>>>> learnt.
>>>>>>> Therefore, it means "when eIBRS is active, the RSB may fall back to
>>>>>>> alternative predictors but restricted to the current prediction mode".  
>>>>>>> As
>>>>>>> such, it's stronger statement than RSBA, but still means "Retpoline not 
>>>>>>> safe".
>>>>>>>
>>>>>>> CPUs are not expected to enumerate both RSBA and RRSBA.
>>>>>>>
>>>>>>> Add feature dependencies for EIBRS and RRSBA.  While technically 
>>>>>>> they're not
>>>>>>> linked, absolutely nothing good can come of letting the guest see RRSBA
>>>>>>> without EIBRS.  Nor a guest seeing EIBRS without IBRSB.  Furthermore, 
>>>>>>> we use
>>>>>>> this dependency to simplify the max derivation logic.
>>>>>>>
>>>>>>> The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
>>>>>>> dependency maybe hiding RRSBA).  We can run any VM, even if it has been 
>>>>>>> told
>>>>>>> "somewhere you might run, Retpoline isn't safe".
>>>>>>>
>>>>>>> The default policies are more complicated.  A guest shouldn't see both 
>>>>>>> bits,
>>>>>>> but it needs to see one if the current host suffers from any form of 
>>>>>>> RSBA, and
>>>>>>> which bit it needs to see depends on whether eIBRS is visible or not.
>>>>>>> Therefore, the calculation must be performed after 
>>>>>>> sanitise_featureset().
>>>>>>>
>>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>>>>>>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>>>>> CC: Wei Liu <wl@xxxxxxx>
>>>>>>>
>>>>>>> v3:
>>>>>>>  * Minor commit message adjustment.
>>>>>>>  * Drop changes to recalculate_cpuid_policy().  Deferred to a later 
>>>>>>> series.
>>>>>> With this dropped, with the title not saying "max/default", and with
>>>>>> the description also not mentioning "live" policies at all, I don't
>>>>>> think this patch is self-consistent (meaning in particular: leaving
>>>>>> aside the fact that there's no way right now to requests e.g. both
>>>>>> RSBA and RRSBA for a guest; aiui it is possible for Dom0).
>>>>>>
>>>>>> As you may imagine I'm also curious why you decided to drop this.
>>>>> Because when I tried doing levelling in Xapi, I remembered why I did it
>>>>> the way I did in v1, and why the v2 way was wrong.
>>>>>
>>>>> Xen cannot safely edit what the toolstack provides, so must not. 
>>>> And this is the part I don't understand: Why can't we correct the
>>>> (EIBRS,RSBA,RRSBA) tuple to a combination that is "legal"? At least
>>>> as long as ...
>>>>
>>>>> Instead, failing the set_policy() call is an option, and is what we want
>>>>> to do longterm,
>>>> ... we aren't there.
>>>>
>>>>> but also happens to be wrong too in this case. An admin
>>>>> may know that a VM isn't using retpoline, and may need to migrate it
>>>>> anyway for a number of reasons, so any safety checks need to be in the
>>>>> toolstack, and need to be overrideable with something like --force.
>>>> Possibly leading to an inconsistent policy exposed to a guest? I
>>>> guess this may be the only option when we can't really resolve an
>>>> ambiguity, but that isn't the case here, is it?
>>> Wrong.  Xen does not have any knowledge of other hosts the VM might
>>> migrate to.
>>>
>>> So while Xen can spot problem combinations *on this host*, which way to
>>> correct the problem combination depends on where the VM might migrate to.
>> I actually view this as two different levels: With a flawed policy, the
>> guest is liable to not work correctly at all. No point thinking about
>> it being able to migrate. With a fixed up policy it may fail to migrate,
>> but it'll at least work otherwise.
> 
> If you get RSBA and/or RRSBA wrong, nothing is going to malfunction in
> the guest, even if you migrate it.
> 
> The consequence of getting RSBA and/or RRSBA wrong is the guest *might*
> think retpoline is safe to use, and *might* end up vulnerable to
> speculative attacks on this or other hardware.

Isn't that some sort of "malfunction"?

> And the admin might know that they overrode the default settings and
> forced the use of some other protection mechanism, so the guest is in
> fact safe despite having wrong RSBA/RRSBA settings.

But then the guest would also be safe with adjusted settings, wouldn't it?

> I don't know how to put it any more plainly.  Xen *does not* have the
> information necessary to make a safety judgement in this matter.  Only
> the toolstack (as a proxy for the admin) has the necessary information.

I'm not looking at it as Xen making things safe by adjusting bogus
settings. I'm merely looking at it as not letting a guest run that way.
For the safety aspect I agree it needs a wider view than Xen has.

Anyway, I don't think either of us is going to convince the other of
there being only one way of looking at things vs there being at least
two possible ways, so in order to allow things to progress
Acked-by: Jan Beulich <jbeulich@xxxxxxxx>

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.