[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10] x86emul: support LKGS


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 8 Apr 2026 14:00:02 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Teddy Astie <teddy.astie@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 08 Apr 2026 12:00:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.04.2026 13:34, Andrew Cooper wrote:
> On 08/04/2026 11:22 am, Jan Beulich wrote:
>> ---
>> For PV save_segments() would need adjustment,
> 
> Not really.  CPL3 must never have a way of modifying GS_KERN, hence ...
> 
>> but the insn being restricted to ring 0 means PV guests can't use it anyway
> 
> ... the CPL0 restriction.
> 
> Arguably I should have had this in one of the FRED patches:
> 
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1952,7 +1952,7 @@ static void load_segments(struct vcpu *n)
>   * changes to bases can also be made with the WR{FS,GS}BASE instructions, 
> when
>   * enabled.
>   *
> - * Guests however cannot use SWAPGS, so there is no mechanism to modify the
> + * Guests cannot use SWAPGS or LKGS, so there is no mechanism to modify the
>   * inactive GS base behind Xen's back.  Therefore, Xen's copy of the inactive
>   * GS base is still accurate, and doesn't need reading back from hardware.
>   *
> 
> 
> but I don't think it's appropriate to merge into this patch.
> 
>> (unless we wanted to emulate it as another privileged insn).
> 
> We already have "LKGS" in hypercall form.  It's spelt
> SEGBASE_GS_USER_SEL and has existed for 20 years or so.

Hmm, yes.

> I don't see any reason to extend emul_priv_op().

Nor do I. Nevertheless I wanted to mention the PV aspect.

>> I've also dropped the test harness read_segment() change. It generally
>> would be correct to have, but isn't needed anymore with neither SWAPGS
>> nor LKGS handling using the hook.
> 
> Dropping read_segment() makes your patch depend on Teddy's, now that
> test_x86_emulator is blocking in CI.

I'm not dropping read_segment() from there. I've dropped a change to
that function that v9 had. That depends on your change (which has gone
in), but not Teddy's. Or else I may not understand what you mean.

>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -2899,8 +2899,37 @@ x86_emulate(
>>                  break;
>>              }
>>              break;
>> -        default:
>> -            generate_exception_if(true, X86_EXC_UD);
>> +
>> +        case 6: /* lkgs */
>> +            generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2,
>> +                                  X86_EXC_UD);
>> +            generate_exception_if(!mode_64bit() || !mode_ring0(), 
>> X86_EXC_UD);
>> +            vcpu_must_have(lkgs);
>> +            fail_if(!ops->read_msr || !ops->write_segment || 
>> !ops->write_msr);
>> +            if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val,
>> +                                     ctxt)) != X86EMUL_OKAY ||
>> +                 (rc = ops->read_msr(MSR_GS_BASE, &sreg.base,
>> +                                         ctxt)) != X86EMUL_OKAY )
>> +                goto done;
>> +            dst.orig_val = sreg.base; /* Preserve full GS Base. */
> 
> "Preserve current GS Base."
> 
>> +            if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
>> +                                         ctxt, ops)) != X86EMUL_OKAY )
>> +                goto done;
>> +            /* Write (32-bit) base into SHADOW_GS. */
> 
> "Write new base into SHADOW_GS.  Zero extended from GDT/LDT."
> 
>> +            if ( (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
>> +                                      ctxt, false)) != X86EMUL_OKAY ||
>> +                 (sreg.base = dst.orig_val, /* Reinstate full GS Base. */
> 
> "Reinstate original GS base."

I can make these adjustments, sure, yet I think my forms were clear enough.

> This patch needs one more hunk:
> 
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -765,14 +765,25 @@ static void __init calculate_hvm_max_policy(void)
>       */
>      __set_bit(X86_FEATURE_NO_LMSL, fs);
>  
> -    /*
> -     * On AMD, PV guests are entirely unable to use SYSENTER as Xen runs in
> -     * long mode (and init_amd() has cleared it out of host
> capabilities), but
> -     * HVM guests are able if running in protected mode.
> -     */
> -    if ( (boot_cpu_data.vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) &&
> -         raw_cpu_policy.basic.sep )
> -        __set_bit(X86_FEATURE_SEP, fs);
> +    if ( boot_cpu_data.vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
> +    {
> +        /*
> +         * On AMD, PV guests are unable to use SYSENTER as Xen runs in long
> +         * mode (and init_amd() has cleared it out of host
> capabilities), but
> +         * HVM guests are able if running in protected mode.
> +         */
> +        if ( raw_cpu_policy.basic.sep )
> +            __set_bit(X86_FEATURE_SEP, fs);
> +
> +        /*
> +         * NullSelectorClearsBase is really a "hardware doesn't have
> this bug
> +         * any more" bit.  All FRED-capable hardware has NSCB
> properties, so
> +         * disallow a configuration which suggest/causes behaviour the
> OS isn't
> +         * expecting.
> +         */
> +        if ( !test_bit(X86_FEATURE_NSCB, fs) )
> +            __clear_bit(X86_FEATURE_LKGS, fs);
> +    }
>  
>      /*
>       * VIRT_SSBD is exposed in the default policy as a result of
> 
> 
> because otherwise a CPU Policy could hide NCSB and LKGS would be have
> correctly when executed normally but malfunction in the emulator.

A policy cannot validly hide NSCB, as the flag - whichever way it is set -
describes how the underlying hardware works. We'd need to intercept and
emulate all selector loads to allow flag and hardware behavior to be out
of sync. I.e. what you say for LKGS would be true for all selector loads.

> This hunk is in lieu of having vendor-dependent deep-deps calculations,
> although it would need duplicating in userspace too.
> 
> Because this is only a link between an AMD-only feature and a common
> feature, I think I can express it by only having a per-vendor
> deep_features bitmap and keeping a shared deep_deps matrix.
> 
> Perhaps I should prototype that instead, but it would become another
> dependency for this patch.

Please do, albeit as per above I don't think it's truly a prereq to the
one here.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.