[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] x86/mm-locks: apply a bias to lock levels for current domain

On Wed, Dec 19, 2018 at 11:40:14AM +0000, George Dunlap wrote:
> On 12/18/18 4:05 PM, Roger Pau Monne wrote:
> > paging_log_dirty_op function takes mm locks from a subject domain and
> > then attempts to perform copy to operations against the caller
> > domain in order to copy the result of the hypercall into the caller
> > provided buffer.
> > 
> > This works fine when the caller is a non-paging domain, but triggers a
> > lock order panic when the caller is a paging domain due to the fact
> > that at the point where the copy to operation is performed the subject
> > domain paging lock is locked, and the copy operation requires locking
> > the caller p2m lock which has a lower level.
> > 
> > Fix this limitation by adding a bias to the level of the caller domain
> > mm locks, so that the lower caller domain mm lock always has a level
> > greater than the higher subject domain lock level. This allows locking
> > the subject domain mm locks and then locking the caller domain mm
> > locks, while keeping the same lock ordering and the changes mostly
> > confined to mm-locks.h.
> > 
> > Note that so far only this flow (locking a subject domain locks and
> > then the caller domain ones) has been identified, but not all possible
> > code paths have been inspected. Hence this solution attempts to be a
> > non-intrusive fix for the problem at hand, without discarding further
> > changes in the future if other valid code paths are found that require
> > more complex lock level ordering.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> As a quick fix I think this general approach is OK; the thing I don't
> like is that it's symmetric.  We don't *expect* to ever have a situation
> where A grabs one of its own MM locks and then one of B's, *and* B then
> grabs one of its own locks and then A's; but it could happen.

I have not identified such scenario ATM, but we cannot discard future
features needing such interlocking I guess. In any case, I think this
is something that would have to be solved when we came across such
scenario IMO.

> Since we've generally identified dom0 which may be grabbing locks of a
> PVH stubdom, which may be grabbing logs of a normal domU, would it be
> possible / make sense instead to give a 2x bonus for dom0, and a 1x
> bonus for "is_priv_for" domains?

Jan pointed out such case, but I'm not sure I can see how this is
supposedly to happen even given the scenario above, I have to admit
however I'm not that familiar with the mm code, so it's likely I'm
missing something.

Hypercalls AFAIK have a single target (or subject) domain, so even if
there's a stubdomain relation I'm not sure I see why that would
require this kind of locking, any domain can perform hypercalls
against a single subject domain, and the hypervisor itself doesn't
even know about stubdomain relations.

> That way we enforce that all target locks are acquired before all
> stubomain locks, and all stubdomain locks are acquired before all dom0
> locks: we have a total ordering of locks (not counting between "peer" VMs).
> I suspect, however, that at some point we'll discover some path where
> the order will need to be reversed; at which point we'll need to figure
> out something else.

That's indeed possible, I've tried to note it in the last paragraph of
my commit message.

> I also think it would be good to start recording specific instances of
> ordering requirements, so that we don't have to go track them down.
> (It's not immediately obvious to me, for instance, why the paging lock
> is so far down the list.)  Could you add somewhere into the comments in
> this section something like this:
> paging_log_dirty_op grabs tgt paging_lock, then does copy_to_user which
> grabs dom0's P2M lock.
> The other thing we could do is generate the lock level as (OWNER_TYPE <<
> 11) | (LOCK_TYPE << 5) | (domain_id), where OWNER_TYPE is 2 for dom0, 1
> for stub domains, 0 for normal domains, and LOCK_TYPE is the current
> lock level; and then fail if the new lock level >= current lock level.
> That would flag up any potential inter-domain deadlocks as well.

I'm not sure it will catch all inter-domain locks. For example Xen
would still allow to take the same lock type from a domain with a
higher domain_id than the currently locked one if both have the same

Thanks, Roger.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.