[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.14] x86/hap: use get_gfn_type in hap_update_paging_modes



On Wed, Jun 17, 2020 at 3:59 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 16.06.2020 21:31, Tamas K Lengyel wrote:
> > While forking VMs running a small RTOS systems (Zephyr) a Xen crash has been
> > observed due to a mm-lock order violation while copying the HVM CPU context
> > from the parent. This issue has been identified to be due to
> > hap_update_paging_modes getting a lock on the gfn using get_gfn. This call 
> > also
> > creates a shared entry in the fork's memory map for the cr3 gfn. The 
> > function
> > later calls hap_update_cr3 while holding the paging_lock, which results in 
> > the
> > lock-order violation in vmx_load_pdptrs when it tries to unshare the above 
> > entry.
> >
> > This issue has not affected VMs running other OS's as a call to 
> > vmx_load_pdptrs
> > is benign if PAE is not enabled or if EFER_LMA is set and returns before
> > trying to unshare and map the page.
> >
> > Using get_gfn_type to get a lock on the gfn avoids this problem as we can
> > populate the fork's gfn with a copied page instead of a shared entry if its
> > needed, thus avoiding the lock order violation while holding paging_lock.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>
> > ---
> > The bug seems to have been present since commit 4cb6c4f4941, only discovered
> > now due to the heavy use of mem_sharing with VM forks.
>
> I'm unconvinced that commit is the origin of the problem. Before that,
> get_gfn_unshare() was used, which aiui had/has similar locking
> properties. In fact I'm having trouble seeing how this works at all,
> i.e. with or without mem-sharing: p2m_get_page_from_gfn() at the very
> least uses p2m_read_lock(), which also doesn't nest inside the paging
> lock. What am I overlooking?

I haven't investigated past what git blame showed, that's why I said
"seems to have been present since". Entirely possible it was broken
before as well due to the lock ordering. I'm not sure about
p2m_get_page_from_gfn, haven't ran into an issue there (yet).

>
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -748,12 +748,19 @@ static void hap_update_paging_modes(struct vcpu *v)
> >      struct domain *d = v->domain;
> >      unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
> >      p2m_type_t t;
> > +    p2m_query_t q = P2M_ALLOC;
> >
> > -    /* We hold onto the cr3 as it may be modified later, and
> > -     * we need to respect lock ordering. No need for
> > -     * checks here as they are performed by vmx_load_pdptrs
> > -     * (the potential user of the cr3) */
> > -    (void)get_gfn(d, cr3_gfn, &t);
> > +    /*
> > +     * We hold onto the cr3 as it may be modified later, and
> > +     * we need to respect lock ordering. Unshare here if we have to as to 
> > avoid
> > +     * a lock-order violation later while we are holding the paging_lock.
> > +     * Further checks are performed by vmx_load_pdptrs (the potential user 
> > of
> > +     * the cr3).
> > +     */
>
> Nit: Please re-flow such that the first line isn't very noticeably
> shorter than the later ones.

Sure.

>
> > +    if ( hvm_pae_enabled(v) && !hvm_long_mode_active(v) )
> > +        q |= P2M_UNSHARE;
> > +
> > +    (void)get_gfn_type(d, cr3_gfn, &t, q);
>
> Imo this then wants to be accompanied by dropping the unsharing
> attempt from vmx_load_pdptrs(). By using get_gfn_query_unlocked()
> there (assuming all code paths hold the paging lock) the lock
> order issue could be addressed in full then. (If taking this
> route, the comment ahead of get_gfn_query_unlocked()'s declaration
> will want adjusting, to drop mentioning shadow mode explicitly as
> leveraging already holding the paging lock.)

There is at least another path to vmx_load_pdptrs from
arch_set_info_hvm_guest as well which doesn't hold the paging lock.

> If there are code paths of both kinds, which approach to use in
> vmx_load_pdptrs() may need to be chosen based on what
> paging_locked_by_me() returns. Or perhaps an unlocked query is
> fine in either case?

Perhaps adjusting vmx_load_pdptrs to chose the unlocked query would be
fine. But at that point what is the reason for having the lock
ordering at all? Why not just have a single recursive lock and avoid
issues like this altogether?

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.