[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] get_gfn_query() locking



On Oct 30, 2012, at 5:36 AM, Tim Deegan <tim@xxxxxxx> wrote:

At 09:23 +0000 on 30 Oct (1351588996), Jan Beulich wrote:
while looking at how "X86/vMCE: handle broken page occurred
before migration" uses the p2m code, I wondered whether the
comment

* N.B. get_gfn_query() is the _only_ one guaranteed not to take the
* p2m lock; none of the others can be called with the p2m or paging
* lock held. */

isn't stale

Yes, it seems to be stale.  I think that even with the originally
proposed fine-grained p2m locking it would have been possible for this
to take the p2m lock. :(

Stale comment indeed.

- afaict with get_gfn_type_access() passing "1" for the
"locked" parameter of __get_gfn_type_access(), there's no way
to avoid the locking without using __get_gfn_type_access()
directly.

get_gfn_query_unlocked() is the way to do lookups without the lock, but
of course if you do that then you've no guarantee that the mfn it
returns won't get freed under your feet.

get_gfn_query_unlocked() is really only meant for debugging, domain is toast, etc. As Tim says, no guarantee that you are getting a live mfn and/or p2m type.

Having said that, we have a big XOR for domains that use IOMMU -- no paging or sharing. From an mm p.o.v. they are fairly vanilla domains and under most circumstances one could perform unlocked queries. Still there are dangerous situations in which a p2m entry might change. I do not think we have an interlock in place for PoD, and if the domain were to run a balloon driver you'd have to trouble.

I say this because a similar interlock may apply for vMCE? Is there an expectation for a domain with vMCE turned on to be "land-locked" memory-wise?

And then again, with the p2m lock being recursive these
days, I don't think there's any harm calling the other methods
here with that lock held.

Is the patch you refer to http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in question the following?
+ get_gfn_query(d, pfn, &pt); 
+ p2m_change_type(d, pfn, pt, p2m_ram_broken); 
+ put_gfn(d, pfn); 

There really is no way to get rid of that p2m lock-protected critical section if the domain allows for paging etc. You might want to introduce a syntactically cleaner unconditional p2m_change_type variant that doesn't cmpxchg with the previous type -- that is effectively what goes on here. Should be a tiny amount of refactoring and the code will be cleaner, no need for query or put.


True, but it wouldn't be safe to call it with the paging lock held.
OTOH since we're not seeing any crashes from the lock-ordering
constraints maybe we don't do that.

Andres, what do you think?  Can we just drop/amend that comment?


If you refer to removing the ordering constraints, my opinion is to NACK that. I think once you remove the ordering constraints it's only a matter of time before deadlock-inducing code creeps in. These mm locks are loads of fun to get right (that kind of fun). The constraints ensure that any developer new to the mm code will get straightened out immediately.

Note that the ordering constraints allow you to, beyond recursively taking the p2m lock, to take p2m -> paging -> p2m locks. But they does *not* allow you to do get_page_from_gfn -> paging lock -> p2m lock.

Obviously we need more documentation in the .h files. I'll get to that next week. Jan, ping me if I haven't submitted something to the 4.2.1 tree by the close.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.