[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 for-4.14] x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE



On Thu, Jun 18, 2020 at 06:21:42AM -0600, Tamas K Lengyel wrote:
> On Thu, Jun 18, 2020 at 3:42 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 18, 2020 at 08:30:08AM +0200, Jan Beulich wrote:
> > > On 17.06.2020 18:19, Tamas K Lengyel wrote:
> > > > While forking VMs running a small RTOS system (Zephyr) a Xen crash has 
> > > > been
> > > > observed due to a mm-lock order violation while copying the HVM CPU 
> > > > context
> > > > from the parent. This issue has been identified to be due to
> > > > hap_update_paging_modes first getting a lock on the gfn using get_gfn. 
> > > > This
> > > > call also creates a shared entry in the fork's memory map for the cr3 
> > > > gfn. The
> > > > function later calls hap_update_cr3 while holding the paging_lock, which
> > > > results in the lock-order violation in vmx_load_pdptrs when it tries to 
> > > > unshare
> > > > the above entry when it grabs the page with the P2M_UNSHARE flag set.
> > > >
> > > > Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE 
> > > > was
> > > > unnecessary to start with. Using P2M_ALLOC is the appropriate flag to 
> > > > ensure
> > > > the p2m is properly populated and to avoid the lock-order violation we
> > > > observed.
> > >
> > > Using P2M_ALLOC is not going to address the original problem though
> > > afaict: You may hit the mem_sharing_fork_page() path that way, and
> > > via nominate_page() => __grab_shared_page() => mem_sharing_page_lock()
> > > you'd run into a lock order violation again.
> >
> > Well, I guess Tamas avoids this because of the get_gfn call in
> > hap_update_paging_modes will have already populated the entry, so it's
> > never going to hit the p2m_is_hole check in __get_gfn_type_access.
> >
> > > The change is an improvement, so I'd be fine with it going in this
> > > way, but only as long as the description mentions that there's still
> > > an open issue here (which may be non-trivial to address). Or perhaps
> > > combining with your v1 change is the way to go (for now or even
> > > permanently)?
> >
> > If vmx_load_pdptrs only requires P2M_ALLOC then this is already
> > covered by the call to get_gfn performed in hap_update_paging_modes,
> > so I don't think there's much point in merging with v1, as forcing
> > hap_update_paging_modes to unshare the entry won't affect
> > vmx_load_pdptrs anymore.
> >
> > I'm however worried about other code paths that can call into
> > vmx_load_pdptrs with mm locks taken, and I agree it would be safer to
> > assert that all the higher layers make sure the cr3 loaded is
> > correctly populated for a query without P2M_ALLOC to succeed.
> 
> Using P2M_ALLOC is always safe if 1) the entry is already populated
> like in this case but also in 2) in case the gfn is a hole and gets
> forked. In mem_sharing the paging lock order is only applicable when
> an already present entry is getting converted to a shared type or a
> shared typed is getting unshared. It does not apply when a hole is
> being plugged.

But a hole being plugged can also imply that a page is being set
shareable by nominate_page, which will take the mem sharing page lock?

That would be the path: get_gfn_type_access -> __get_gfn_type_access
-> (hole found in p2m) -> mem_sharing_fork_page -> nominate_page (with
page not being shareable already).

It's likely I'm missing some bits, this is all quite complex.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.