[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default



On Fri, Jan 06, 2023 at 03:30:01AM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 05, 2023 at 09:00:03PM -0500, Demi Marie Obenour wrote:
> > On Thu, Jan 05, 2023 at 08:28:26PM +0000, Andrew Cooper wrote:
> > > On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
> > > > diff --git a/docs/misc/xen-command-line.pandoc 
> > > > b/docs/misc/xen-command-line.pandoc
> > > > index 
> > > > 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86
> > > >  100644
> > > > --- a/docs/misc/xen-command-line.pandoc
> > > > +++ b/docs/misc/xen-command-line.pandoc
> > > > @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon 
> > > > accesses to that port.
> > > >  ### idle_latency_factor (x86)
> > > >  > `= <integer>`
> > > >  
> > > > +### invalid-cacheability (x86)
> > > > +> `= allow | deny | trap`
> > > > +
> > > > +> Default: `deny` in release builds, otherwise `trap`
> > > > +
> > > > +Specify what happens when a PV guest tries to use one of the reserved 
> > > > entries in
> > > > +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, 
> > > > `allow` allows
> > > > +the attempt, and `trap` causes a general protection fault to be raised.
> > > > +Currently, the reserved entries are marked as uncacheable in Xen's 
> > > > PAT, but this
> > > > +will change if new memory types are added, so guests must not rely on 
> > > > it.
> > > > +
> > > 
> > > This wants to be scoped under "pv", and not a top-level option.  Current
> > > parsing is at the top of xen/arch/x86/pv/domain.c
> > > 
> > > How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean 
> > > ?
> > 
> > Works for me, though I will use ‘invalid’ instead of ‘oob’ as valid PAT
> > entries might not be contiguous.
> 
> If we're talking about alternative PAT settings, I'm not sure if they
> can be called "invalid". With the default Xen's choice of PAT, top two
> entries are documented as reserved (in xen.h) and only that makes them
> forbidden to use. But with alternative settings, it's only behavior of
> Linux parsing that prefers using lower options. When breaking contract
> set in xen.h, "reserved" entries stop being reserved too.

That makes sense.

> So, _if_ that option would be applicable alternative PAT choice, it's
> only useful for debugging Linux specifically (assuming Linux won't
> change its approach to choosing entries - which I think it's allowed to
> do).

Point taken.

> > > There really is no need to distinguish between deny and trap.  IMO,
> > > injecting #GP unilaterally is fine (to a guest expecting the hypercall
> > > to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
> > > useful to a human trying to debug issues here), but I could be talked
> > > into putting that behind a CONFIG_DEBUG if other feel strongly.
> > 
> > Marek, thoughts?
> 
> With Xen's default PAT, #GP may be useful indeed, but it must come with
> a message why it was injected.

In xl dmesg?

> > > > @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info 
> > > > *page)
> > > >          }
> > > >          else
> > > >          {
> > > > -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> > > > +            l1_pgentry_t l1e = pl1e[i];
> > > > +
> > > > +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> > > > +            {
> > > > +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
> > > > +                {
> > > > +                case _PAGE_WB:
> > > > +                case _PAGE_UC:
> > > > +                case _PAGE_UCM:
> > > > +                case _PAGE_WC:
> > > > +                case _PAGE_WT:
> > > > +                case _PAGE_WP:
> > > > +                    break;
> > > > +                default:
> > > > +                    /*
> > > > +                     * If we get here, a PV guest tried to use one of 
> > > > the
> > > > +                     * reserved values in Xen's PAT.  This indicates a 
> > > > bug
> > > > +                     * in the guest.  If requested by the user, inject 
> > > > #GP
> > > > +                     * to cause the guest to log a stack trace.
> > > > +                     */
> > > > +                    if ( invalid_cacheability == 
> > > > INVALID_CACHEABILITY_TRAP )
> > > > +                        pv_inject_hw_exception(TRAP_gp_fault, 0);
> > > > +                    ret = -EINVAL;
> > > > +                    goto fail;
> > > > +                }
> > > > +            }
> > > 
> > > This will catch cases where the guest writes a full pagetable, then
> > > promotes it, but won't catch cases where the guest modifies an
> > > individual entry with mmu_update/etc.
> > > 
> > > The logic needs to be in get_page_from_l1e() to get applied uniformly to
> > > all PTE modifications.
> > 
> > I will move it there, and also update Qubes OS’s patchset.
> > 
> > > Right now, the l1_disallow_mask() check near the start hides the "can
> > > you use a nonzero cacheattr" check.  If I ever get around to cleaning up
> > > my post-XSA-402 series, I have a load of modifications to this.
> > 
> > I came up with some major cleanups too.
> > 
> > > But this could be something like this:
> > > 
> > > if ( opt_pv_oob_pat && (l1f & PAGE_CACHE_ATTRS) > _PAGE_WP )
> > > {
> > >     // #GP
> > >     return -EINVAL;
> > > }
> > > 
> > > fairly early on.
> > > 
> > > It occurs to me that this check is only applicable when we're using the
> > > Xen ABI APT, and seeing as we've talked about possibly making patch 5 a
> > > Kconfig option, that may want accounting here.  (Perhaps simply making
> > > opt_pb_oob_pat be false in a !XEN_PAT build.)
> > 
> > It’s actually applicable even with other PATs.  While Marek and I were
> > tracking down an Intel iGPU cache coherency problem, Marek used it to
> > verify that PAT entries that we thought were not being used were in fact
> > unused.  This allowed proving that the behavior of the GPU was impacted
> > by changes to PAT entries the hardware should not even be looking at,
> > and therefore that the hardware itself must be buggy.
> 
> In fact, I did that via WARN() on the Linux side, to _not_ have guest
> killed in this case, to potentially collect more info.

Whoops!  I must have misunderstood what you meant by "trap".

> As said above,
> with alternative PAT settings, the contract which entries are "valid"
> isn't there anymore, so punishing guest for using them isn't
> appropriate. It could still be useful feature for debugging Linux (and
> it feels, we'll need this feature for some time...). So, with !XEN_PAT
> it should be at least disabled by default, or maybe even hidden behind
> CONFIG_DEBUG.

Okay, makes sense.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.