[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications



> At 10:03 -0800 on 14 Nov (1321264988), Andres Lagar-Cavilla wrote:
>> Hi there,
>> > At 16:42 -0500 on 08 Nov (1320770545), Andres Lagar-Cavilla wrote:
>> >>  xen/arch/x86/mm/mm-locks.h |  13 +++++++------
>> >>  xen/arch/x86/mm/p2m.c      |  18 +++++++++++++++++-
>> >>  xen/include/asm-x86/p2m.h  |  39
>> >> ++++++++++++++++++++++++---------------
>> >>  3 files changed, 48 insertions(+), 22 deletions(-)
>> >>
>> >>
>> >> We achieve this by locking/unlocking the global p2m_lock in
>> get/put_gfn.
>> >> This brings about a few consequences for the p2m_lock:
>> >>
>> >>  - not ordered anymore in mm-locks.h: unshare does get_gfn ->
>> shr_lock,
>> >>    there are code paths that do paging_lock -> get_gfn. All of these
>> >>    would cause mm-locks.h to panic.
>> >
>> > In that case there's a potential deadlock in the sharing code, and
>> > turning off the safety catches is not an acceptable response to that.
>> :)
>> >
>> > ISTR you had a plan to get rid of the shr_lock entirely, or am
>> > I misremembering?
>> Sharing is actually fine, I can reorder those safely until I get rid of
>> the shr_lock. Except for sharing audits, which basically lock the whole
>> hypervisor, and _no one is using at all_.
>>
>> I have a more fundamental problem with the paging lock. sh_update_cr3
>> can
>> be called from a variety of situations. It will walk the four top level
>> PAE mappings, acquiring the p2m entry for each, with the paging lock
>> held.
>> This is an inversion & deadlock, if I try to synchronize p2m lookups
>> with
>> the p2m lock.
>
> Is sh_update_cr3() really called with p2m locks/refs held?  Since it
> doesn't take a frame number as an argument it might be possible to
> shuffle things around at the callers.

Ok, I've refined this a bit. There are several instances of code that
takes the paging_lock and later needs to perform a p2m lookup. Apart from
the previously mentioned example here are two others:

hap_update_paging_modes -> ... -> vmx_load_pdptrs -> get_gfn(guest_cr3)

sh_x86_emulate_* -> ... -> validate_gl?e -> get_gfn_query()

None of these functions are called with p2m locks/refs held, and likely
they shouldn't. However, they will result in a deadlock panic situation,
if get_gfn takes a lock.

Here is one solution to consider: do not lock the p2m on lookups in shadow
mode. Shadow mode does not support paging out and sharing of pages, which
are primary reasons why we want synchronized p2m lookups. The hap cases
where there is an inversion of the p2m_lock -> paging_lock order are
reasonably simple to handle.

The other option is to invert the order and place paging_lock -> p2m_lock,
but that will raise another set of potential inversions. I think this is a
no go.

Finally, one could audit all these calls and have them perform futurology,
get_gfn the gfn they will end up perhaps looking up, before taking the
paging_lock. I also think this is a no go.

All very unsavory.
Andres

>
>> Any suggestions here? Other than disabling ordering of the p2m lock?
>
> Disabling the ordering won't do, so we need to find something else.
>
> Tim.
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.