[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

>>> On 20.01.16 at 21:10, <andrew.cooper3@xxxxxxxxxx> wrote:
> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
> SMEP/SMAP choices, because of running in ring 1.
> SMAP in particular is problematic because older Linux guests do fall
> foul of it; they don't understand what a SMAP pagefault is, and enter an
> infinite loop of pagefaults.  SMEP is also problematic because it breaks
> any guest wishing to use a shared address space between kernel and
> user.  (I had some fun getting the test framework to function until I
> twigged what was happening).
> Both of these are regressions; older guests relying on existing
> behaviour cease to function on newer hardware/Xen despite identical
> settings.

And for both of them there simply should be a way for the guest to
state whether it's compatible (which should be the case for anything
we can't deal with completely transparently to guests).

> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
> build (so there is a guest observable difference between running on a
> debug and a non-debug Xen), and the comment beside it even identifies
> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
> is set.  Currently this means that we are not able to support Protection
> Key for PV guests (although this restriction technically only applies to
> debug builds of the hypervisor).
> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
> is used to notice when a 64bit PV guest attempts to override the fixup
> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
> warning is logged and the kernel overridden.
> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
> them serve a purpose other than a debugging aid.
> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
> "all bits available for guest use".

And a kernel using any of the conflicting bits would then become
unusable on a hypervisor with that debug option enabled? I'd
rather see us document the state things are in...

> The other question is what we do when it comes to %cr4 and PV guests.
> The current SMAP issue is a blocker for XenServer, and I have some nasty
> logic to fix up behind the guests back.  I have only just discovered the
> SMEP issue, but it is still a regression (again, nothing states that a
> PV guest must have a split address space;  segmentation is a perfectly
> valid option in 32bit guests).  The PK issue is one which shouldn't be
> an issue for us to implement in PV guests.
> I am leaning towards allowing a toolstack to permit a PV guest to be
> able to play with a few more CR4 bits.  We can't give a guest kernel
> complete carte blanche, because of the security implications.  However,
> we do already context switch CR4 for PV guests, so a few extra bits  on
> a "nominated safe" domain is no extra hassle.

Sounds reasonable.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.