[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE



Hi,

On Fri, 2019-12-13 at 14:19 +0000, Andrew Cooper wrote:
> On 12/12/2019 12:46, Hongyan Xia wrote:
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 7d4dd80a85..8def4fb8d8 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt,
> > bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e = *pl3e;
> > +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> > +
> > +    if ( !l2t )
> > +        return false;
> > +
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +    {
> > +        l2e_write(l2t + i, l2e);
> > +        l2e = l2e_from_intpte(
> > +                  l2e_get_intpte(l2e) + (PAGE_SIZE <<
> > PAGETABLE_ORDER));
> > +    }
> > +
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > +    {
> > +        l3e_write_atomic(pl3e,
> > +            l3e_from_paddr(virt_to_maddr(l2t),
> > __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> > +            flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Another problematic use of ol3e which is racy on conflict.  You need
> to
> strictly use the content of *pl3e from within the locked region.
> 

This is actually just refactoring, although if the original code is
wrong, it is also worth fixing.

In fact, in the last couple of days, the more I read the code in
map_pages_to_xen, the more I am worried about its race conditions and
correctness. The lock is mostly used for writes, so there are so many
reads outside the locked region which could potentially read stale
values. One example I found is (after refactoring):

             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
                 ...
                 /* Pass virt to indicate we need to flush. */
                 if ( !shatter_l2e(pl2e, virt, locking) )
                     return -ENOMEM;
             }

             pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);

It tries to shatter an l2 page before accessing a pl1e, but is there
any guard between the shattering and the read of pl1e? If another call
comes in between the two and merges this page back to a superpage, the
pl1e then accesses the superpage memory instead of a PTE page! (Please
check my logic.) Also in other places, we see the races between PTE
modifications and flushes.

There could be more examples like this. Of course, removing the code
for merging can avoid a lot of the problems, although Julien explained
to me that it could be useful during boot. If removing is not an
option, is it a big problem to extend the lock, e.g., to the whole
function? It is mostly just used by vmap after boot, and a larger lock
can simplify this function and its logic significantly. vmap is already
taking other global locks before map_pages_to_xen anyway though.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.