[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions


  • To: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Tue, 30 Sep 2025 12:43:24 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, virtualization@xxxxxxxxxxxxxxx, xin@xxxxxxxxx, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Ajay Kaher <ajay.kaher@xxxxxxxxxxxx>, Alexey Makhalov <alexey.makhalov@xxxxxxxxxxxx>, Broadcom internal kernel review list <bcm-kernel-feedback-list@xxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 30 Sep 2025 10:43:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30.09.25 12:04, Peter Zijlstra wrote:
On Tue, Sep 30, 2025 at 11:02:52AM +0200, Jürgen Groß wrote:
On 30.09.25 10:38, Peter Zijlstra wrote:
On Tue, Sep 30, 2025 at 09:03:55AM +0200, Juergen Gross wrote:

+static __always_inline u64 read_msr(u32 msr)
+{
+       if (cpu_feature_enabled(X86_FEATURE_XENPV))
+               return xen_read_msr(msr);
+
+       return native_rdmsrq(msr);
+}
+
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
+{
+       if (cpu_feature_enabled(X86_FEATURE_XENPV))
+               return xen_read_msr_safe(msr, p);
+
+       return native_read_msr_safe(msr, p);
+}
+
+static __always_inline void write_msr(u32 msr, u64 val)
+{
+       if (cpu_feature_enabled(X86_FEATURE_XENPV))
+               xen_write_msr(msr, val);
+       else
+               native_wrmsrq(msr, val);
+}
+
+static __always_inline int write_msr_safe(u32 msr, u64 val)
+{
+       if (cpu_feature_enabled(X86_FEATURE_XENPV))
+               return xen_write_msr_safe(msr, val);
+
+       return native_write_msr_safe(msr, val);
+}
+
+static __always_inline u64 rdpmc(int counter)
+{
+       if (cpu_feature_enabled(X86_FEATURE_XENPV))
+               return xen_read_pmc(counter);
+
+       return native_read_pmc(counter);
+}

Egads, didn't we just construct giant ALTERNATIVE()s for the native_
things? Why wrap that in a cpu_feature_enabled() instead of just adding
one more case to the ALTERNATIVE() ?

The problem I encountered with using pv_ops was to implement the *_safe()
variants. There is no simple way to do that using ALTERNATIVE_<n>(), as
in the Xen PV case the call will remain, and I didn't find a way to
specify a sane interface between the call-site and the called Xen function
to return the error indicator. Remember that at the call site the main
interface is the one of the RDMSR/WRMSR instructions. They lack an error
indicator.

Would've been useful Changelog material that I suppose.

In Xin's series there was a patch written initially by you to solve such
a problem by adding the _ASM_EXTABLE_FUNC_REWIND() exception table method.
I think this is a dead end, as it will break when using a shadow stack.

No memories, let me go search. I found this:

   
https://patchwork.ozlabs.org/project/linux-ide/patch/20250331082251.3171276-12-xin@xxxxxxxxx/

That's the other Peter :-)

Oh, my bad, sorry. :-)

Anyway, with shadowstack you should be able to frob SSP along with SP in
the exception context. IIRC the SSP 'return' value is on the SS itself,
so a WRSS to that field can easily make the whole CALL go away.

Yeah, but being able to avoid all of that dance wouldn't be too bad either.

Additionally I found a rather ugly hack only to avoid re-iterating most of
the bare metal ALTERNATIVE() for the paravirt case. It is possible, but the
bare metal case is gaining one additional ALTERNATIVE level, resulting in
patching the original instruction with an identical copy first.

OTOH the above generates atrocious crap code :/

Yes.

You get that _static_cpu_has() crud, which is basically a really fat
jump_label (because it needs to include the runtime test) and then the
code for both your xen thing and the alternative.

Seeing both variants would make it easier to decide, I guess.


/me ponders things a bit..

Remember that at the call site the main interface is the one of the
RDMSR/WRMSR instructions. They lack an error indicator.

This, that isn't true.

Note how ex_handler_msr() takes a reg argument and how that sets that
reg to -EIO. See how the current native_read_msr_safe() uses that:

   _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])

(also note how using _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_*_SAFE) like you
do, will result in reg being 0 or ax. Scribbling your 0 return value)

It very explicitly uses @err as error return value. So your call would
return eax:edx and take ecx to be the msr, but there is nothing stopping
us from then using say ebx for error return, like:

        int err = 0;

        asm_inline(
                "1:\n"
                ALTERNATIVE("ds rdmsr",
                            "call xen_rdmsr", XENPV)
                "2:\n"

                _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ebx)

                : "a" (ax), "d" (dx), "+b" (err)
                : "c" (msr));

        return err;

Hmm?

Oh, indeed.

Let me try that and we can choose the less evil. :-)


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.