[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] x86/spec-ctrl: Reduce HVM RSB overhead where possible


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Thu, 11 Aug 2022 18:06:51 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2zxxZQJOHzIKkHOsVuL+rDKlRwyacsJezRH1V/jpBgg=; b=Z7qsctWNPmB9LzgWb211tRUAWZJrOw6AuuZxYCAfVxnMzomE62c6xtbjkgumFtAqZz+gumZ+ptI9db5R4T4tt9tBXCKEQHggbGrVkG2v3k+zUb+EC4F6TfPsZlXa3C5l5+b+4rcpxz1b+vlnUZUKJCFF9e3IFQFbz3BipFLXHzSnt7tbsCOerwXzyOrdtI6KFBPo8ecjF9fO+4wOhJNzwROMF4lYYKaOWUIS11FAGSBnuPjoJCL8Yw5e449iJtjkh2nqEgdve3oh3EwUq5SlE9Bc04g3ZY/h+ox2pNWeLg/aCXOn11OPytYmD6UruuelQykxCzT+sj5e23T+aZkVhg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EFL2WIJuLo1m8Mj0EpxkZxI27lurLg9y+L5ak1Wrn+HvQFRsRCeNjiSM47XTkWXV9xKX5ATPG3ivi2TnOhEQlvlg/ZLrieadIk7lqB4pd9Uk6DrPSFoC8dAn5lH82WQ/Ikjr9PS7PlQqPr6UzoL5RaLuzI1Ku5qYSGUjlwGrbhRqMTLUt8egcVZBCb/DDAesKq2uQS70Q6E7POzkuw+bJ5Qb4S7q5ywOTUKRm3fT24XYvEM/DtB5kIEVZczGwBpBA3Tsz7g7YR+tsA9UysAgr/p3Ss6i8qcChkC7JCySefdF8uLW7XPZyWtjI8D74DqZp1Y6AOuDQ/f6CxnusWquvQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 11 Aug 2022 18:07:15 +0000
  • Ironport-data: A9a23:ktaTza0GhBeJE3T/y/bDix16xUOB0HFdZcB5A8V8DSJ0RI/PxWDyaNIpsxBweKiAvNtW2ov4UKlPnc5/wAtukbns/0DgINnrRQUueWrPnGjIEwXSaCGQZ6wiXP1+1vUjdBz5Mzn07BgJWwhUjotcs4JdKbJzrh169JHkHcLZ2gHfSbGonAo64v9PIJaCHD8VPMp6goRI/sqEfjh3HsxvjAAi5WdV6Da7HbACDnI6eBrAkqqIDqHZKsT8HhvjSZ0gY2HrlZBZC+UIg9hmaYYC1trRZ7VuG0UChzzqEIy7Bxm0/wmTt8x7V/JsiV+moitgV646pN7Wu4ZbzOcNdON0bm5taIj8deSS+6mWpX6pE9i5+JRAIJSbyAxFG83Vlgabu9+Ko/6edeuf8gdDH1hps5BKrYL8B088VBKJs9djkD81Ag9OHKkZ7Kb26hT+D8Cm62lpcS8kTY4eolROnfu42swaOGW4xHUTK9rQHnC+TQfq/0ZDvyByjVZt2AaLsiwqYjL0ubVJaATdQA57gzCnGmnbxf5yBbcaYoRMbSTeMbca6kZBBWEcdSQG8RI0U+Te0py+bXaYELcfUcHTQxtP+V73pJaVlPzkVmeVTbxNFNxNZlFJ+H1xNE7sy70tcOtAWZ+Mjm+Y8ZI6J16INFS5kdHm26k5bSru/IXjNRIYAv2J4MYZGkqtarUlZS0WR4cbrMYGdclW4rBXc9gSKUqRvnYzdaprGv8LOD4QWJ/UUFoJNzDmmCXLmcAOl5QWqiI/w0mIOlXdx7FyxUnSw+fXJfLAr4LwJhyRYDYxCcmn3OPur9c4gNz3CPjuDaE3AR7HVuYaWekwvn3qnL23H9aG1FkUCEtq+rqkHKJAXMX+24uXnKqSneHv5Hqs2AqOrotNyyynDXnf+vRabNGQy/AuN5o0RiCF4V6mzBZnrj7IpJ2ItsAT3Lq8Nec6lA9pruJ6qO8VjNpgyhjBrtOnuKvhtdy014O7MzCt9vEdnuTu/ivIQ4VX2HPNVkhcSZPBSbLCZM5rasuwlk34wIxrnIq2T/xau/DisvqjgTMTUxhrsifXOcCC45ItorKHvRTKnwY9o9fMigMMkt2pt6qO045Wya/FdQkJ7E6yo3NmNS7IZ17EufpGU+wXr4SwPoA+iqlQOuwTzj3zOAXeq+a3mSg1m1eRsl+O6v0loRKXcMiU31X9CBlPGbiFmCfhoTMvr9mBZ3VTxtnANh3PlAS7nT3/b5KByUCCFAYrjo40kaKiRIp840/ajQD1+27OEsRIOwLqkEFN57h1KLI/msQhL1viaQAlrKpDSUHN4SdeLaAchgIdW+d9JX38JsjcUL7rePMnXoQyCFRNfYa3pgLYCDXd9FGaHY2lAjEuusTH9e1447mM4AaR9aKRYe26N8aXLUeJdzyp/EX828dk+30Km5ebmO774tkP6A5pTjogNEUm8zGM2ZxdLpuyzMDMqMrQkO7oeQggUkQ9ovjEYi0d+Zt7MbXqZogQjmCdV9TOxJKdertuAUjA3TZ1npItcuctqdn50Plm3t4dVlz8nJIgBKvldD53zs04N02Wn7bRdfgJ3je+P7xjdeMC/1pwCdyfkr63LI3aRwswwYz/mmIE5qWbQwZglhPk2yBa/aKxXWb/s2MvkcjM4BDnt5pgqfOnTnC3kX2ZhmfQRs/6pqm+9IwCG3urr5/6grbWRHNAJYanWP2z/ThvZ89Aiz9McK2+Kt7Ezwz6AMS2yGHnaPdDVnDkAKR8XQ6TaQtgVfIgU3DZqQnvSLEMNm8QArIYuwJ4vya3PzZvWt7jyimllGAUrnE3EDqUAKkiFf/Xig9fhvmaJRJxUax9nbRQxgoGJuFalmz0hV2gf8eMShEnC5KZQt6AYXxHfUQNgiv9KianwziEv6kKTKYtQVe5rVYgzVaMPmmrpSw0QA2yjsVol4ODx6O5QTeWles5h8vgf7fefCWp8TmictI5MyveN0/740UooXd/6Njrr1m3IF/Su2diOHk5KblZKE/EOoIYwhqwKP0o4IAQb9lkkYV9428egJkemPYduM7pYQlTKvonNlbQx/mTTvJXv5/FPMh7fneI4P3/iREcnwCfgohlS0OPa0t+yppc5D0HucEgU1sRM8PmXtLTeT2TzzHr1cIWJRd3JNvIR5StBga/ARAX/aVwfYIv18mKX/uazvTONv1uiar/faWGYHyhn6gfN3y+Ey0CDrwN7GG07ddvCQ8dSvuSdxW13yRIb4XQU9+9b42OVNBHP94PGiFeSTTReBGTo+XiYnxJH0nw6S4RrL1vgHJM91bJtkYAI1wTzVVjUOnemweu1l831JnAkke50/h9UoPZrv6Ofm7tvyrjm59BPGFVfMnxj6CxS5RJliHihQDE8W3ySXhtHyoPq5BLHmod8t8oCia2znzlC3C6wJItvbmJmqSicjuuBAM=3D
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYrBGVOnpsJ3EHKEGHOuDU3gegNK2nxnYAgAI7qAA=
  • Thread-topic: [PATCH 2/2] x86/spec-ctrl: Reduce HVM RSB overhead where possible

On 10/08/2022 09:00, Jan Beulich wrote:
> On 09.08.2022 19:00, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/vmx/entry.S
>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>> @@ -44,6 +44,7 @@ ENTRY(vmx_asm_vmexit_handler)
>>          .endm
>>          ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>> +        /* On PBRSB-vulenrable hardware, `ret` not safe before the start of 
>> vmx_vmexit_handler() */
> Besides the spelling issue mentioned by Jason I think this line also
> wants wrapping. Maybe the two comments also want combining to just
> one, such that "WARNING!" clearly applies to both parts.
>
>> @@ -515,7 +515,8 @@ static void __init print_details(enum ind_thunk thunk, 
>> uint64_t caps)
>>              boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
>>              opt_eager_fpu || opt_md_clear_hvm)       ? ""               : " 
>> None",
>>             boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : 
>> "",
>> -           boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : 
>> "",
>> +           boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           :
>> +           boot_cpu_has(X86_BUG_PBRSB)               ? " PBRSB"         : 
>> "",
>>             opt_eager_fpu                             ? " EAGER_FPU"     : 
>> "",
>>             opt_md_clear_hvm                          ? " MD_CLEAR"      : 
>> "",
>>             boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : 
>> "");
> Along the lines of half of what fdbf8bdfebc2 ("x86/spec-ctrl:
> correct per-guest-type reporting of MD_CLEAR") did, I think you also want
> to extend the other (earlier) conditional in this function invocation.

Oh yes, good point.

> I also wonder whether it wouldn't be helpful to parenthesize the new
> construct, such that it'll be more obvious that this is a double
> conditional operator determining a single function argument.

I haven't done that elsewhere.  Personally, I find it easier to follow
the commas on the RHS.

>
>> @@ -718,6 +719,77 @@ static bool __init rsb_is_full_width(void)
>>      return true;
>>  }
>>  
>> +/*
>> + * HVM guests can create arbitrary RSB entries, including ones which point 
>> at
>> + * Xen supervisor mappings.
>> + *
>> + * Traditionally, the RSB is not isolated on vmexit, so Xen needs to take
>> + * safety precautions to prevent RSB speculation from consuming guest 
>> values.
>> + *
>> + * Intel eIBRS specifies that the RSB is flushed:
>> + *   1) on VMExit when IBRS=1, or
>> + *   2) shortly thereafter when Xen restores the host IBRS=1 setting.
>> + * However, a subset of eIBRS-capable parts also suffer PBRSB and need
>> + * software assistance to maintain RSB safety.
>> + */
>> +static __init enum hvm_rsb {
>> +    hvm_rsb_none,
>> +    hvm_rsb_pbrsb,
>> +    hvm_rsb_stuff32,
>> +} hvm_rsb_calculations(uint64_t caps)
>> +{
>> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
>> +         boot_cpu_data.x86 != 6 )
>> +        return hvm_rsb_stuff32;
>> +
>> +    if ( !(caps & ARCH_CAPS_IBRS_ALL) )
>> +        return hvm_rsb_stuff32;
>> +
>> +    if ( caps & ARCH_CAPS_PBRSB_NO )
>> +        return hvm_rsb_none;
>> +
>> +    /*
>> +     * We're choosing between the eIBRS-capable models which don't enumerate
>> +     * PBRSB_NO.  Earlier steppings of some models don't enumerate eIBRS and
>> +     * are excluded above.
>> +     */
>> +    switch ( boot_cpu_data.x86_model )
>> +    {
>> +        /*
>> +         * Core (inc Hybrid) CPUs to date (August 2022) are vulenrable.
>> +         */
>> +    case 0x55: /* Skylake X */
>> +    case 0x6a: /* Ice Lake SP */
>> +    case 0x6c: /* Ice Lake D */
>> +    case 0x7e: /* Ice Lake client */
>> +    case 0x8a: /* Lakefield (SNC/TMT) */
>> +    case 0x8c: /* Tiger Lake U */
>> +    case 0x8d: /* Tiger Lake H */
>> +    case 0x8e: /* Skylake-L */
> Hmm, is SDM Vol 4's initial table wrong then in stating Kaby Lake /
> Coffee Lake for this and ...
>
>> +    case 0x97: /* Alder Lake S */
>> +    case 0x9a: /* Alder Lake H/P/U */
>> +    case 0x9e: /* Skylake */
> ... this? Otoh I notice that intel-family.h also says Skylake in
> respective comments, despite the constants themselves being named
> differently. Yet again ...
>
>> +    case 0xa5: /* Comet Lake */
>> +    case 0xa6: /* Comet Lake U62 */
> ... you call these Comet Lake when intel-family.h says Skylake also for
> these (and names the latter's variable COMETLAKE_L).
>
> What is in the comments here is of course not of primary concern for
> getting this patch in, but the named anomalies will stand out when all
> of this is switched over to use intel-family.h's constants.

Naming in Skylake-uarch is a total mess.  Half is core codenames, and
half is marketing attempting to cover the fact that nothing much changed
in the 10's of steppings for 0x8e/0x9e.

But yes, I do need to clean up a few details here.  I'm still waiting
for some corrections to be made in official docs.

>
>> @@ -1327,9 +1400,33 @@ void __init init_speculation_mitigations(void)
>>       * HVM guests can always poison the RSB to point at Xen supervisor
>>       * mappings.
>>       */
>> +    hvm_rsb = hvm_rsb_calculations(caps);
>> +    if ( opt_rsb_hvm == -1 )
>> +        opt_rsb_hvm = hvm_rsb != hvm_rsb_none;
>> +
>>      if ( opt_rsb_hvm )
>>      {
>> -        setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
>> +        switch ( hvm_rsb )
>> +        {
>> +        case hvm_rsb_pbrsb:
>> +            setup_force_cpu_cap(X86_BUG_PBRSB);
>> +            break;
>> +
>> +        case hvm_rsb_none:
>> +            /*
>> +             * Somewhat arbitrary.  If something is wrong and the user has
>> +             * forced HVM RSB protections on a system where we think nothing
>> +             * is necessary, they they possibly know something we dont.
>> +             *
>> +             * Use stuff32 in this case, which is the most protection we can
>> +             * muster.
>> +             */
>> +            fallthrough;
>> +
>> +        case hvm_rsb_stuff32:
>> +            setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
>> +            break;
>> +        }
>>  
>>          /*
>>           * For SVM, Xen's RSB safety actions are performed before STGI, so
> For people using e.g. "spec-ctrl=no-ibrs" but leaving RSB stuffing enabled
> (or force-enabling it) we'd need to have an LFENCE somewhere as well.

We don't, but it's subtle.

Attempting to exploit PBRSB is a sub-case of trying to exploit general
RSB speculation on other processors which doesn't flush the RSB on vmexit.

Xen doesn't architecturally execute more RETs than CALLs (unlike other
open source hypervisors which do have a problem here), so an attacker
first needs to control speculation to find a non-architectural path with
excess RETs.

This is already makes it a lack-of-defence-in-depth type problem,
because if the attacker could control speculation like that, they'd not
care about chaining it like this to a more complicated exploit.

An attacker has to find enough rets to unwind all the CALLs Xen has done
thus far (3 in this example.  2 from the first RSB loop, and the call up
into the vmexit handler), and then one extra to consume the bad RSB
entry.  i.e. they need to find an unexpected code sequence in Xen with 4
excess RETs, assuming they can find a gadget in vmx_vmexit_handler()
only which they can control speculation with.

All the HVM funcs are altcalls now, which would have been be the obvious
place to try and attack, but can't be attacked any more.  We do have
some indirect branches, and other mechanisms in place to try and protect
them.

But... an attacker has to do all of this, in the speculative shadow of
the mispredicted loop exit, taking it firmly from "theoretically" into
"impossible" territory.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.