Xen project Mailing List

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 17 Dec 2018 16:42:07 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 17 Dec 2018 15:42:38 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Dec 14, 2018 at 04:52:00AM -0700, Jan Beulich wrote: > >>> On 14.12.18 at 12:45, <roger.pau@xxxxxxxxxx> wrote: > > On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote: > >> >>> On 14.12.18 at 11:03, <roger.pau@xxxxxxxxxx> wrote: > >> > I expect the interdomain locking as a result of using a paging caller > >> > domain is going to be restricted to the p2m lock of the caller domain, > >> > as a result of the usage of copy to/from helpers. > >> > > >> > Maybe the less intrusive change would be to just allow locking the > >> > caller p2m lock (that only lock) regardless of the subject domain lock > >> > level? > >> > >> With caller != subject, and with the lock level then being higher > >> than all "normal" ones, this might be an option. But from the > >> very beginning we should keep the transitive aspect here in > >> mind: If Dom0 controls a PVH domain which controls a HVM one, > >> the Dom0 p2m lock may also need allowing to nest inside the PVH > >> DomU's one, so it'll be a total of two extra new lock levels even > >> if we restrict this to the p2m locks. > > > > I'm not sure I follow here, so far we have spoken about a subject and > > a caller domain (subject being the target of the operation). In the > > above scenario I see a relation between Dom0 and the PVH domain, and > > between the PVH domain and the HVM one, but not a relation that > > encompasses the three domains in terms of mm locking. > > > > Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm > > locks, and the PVH DomU (caller) locks could be nested inside the HVM > > domain (subject) locks, but I don't see an operation where Dom0 mm > > lock could be nested inside of a PVH DomU lock that's already nested > > inside of the HVM DomU lock. > > Well, if we're _sure_ this can't happen, then of course we also > don't need to deal with it. So I've got a patch that adds a positive offset to lock priority when the lock owner is the current domain. This allows locking other domains (subject) mm locks and then take the current domain's mm locks. There's one slight issue with the page sharing lock, that will always be locked as belonging to a subject domain, due to the fact that there's no clear owner of such lock. AFAICT some page-sharing routines are even run from the idle thread. I'm attaching such patch below, note it keeps the same lock order as before, but allows taking the caller locks _only_ after having took some of the subject mm locks. Note this doesn't allow interleaved mm locking between the subject and the caller. The nice part about this patch is that changes are mostly confined to mm-locks.h, there's only an extra change to xen/arch/x86/mm/p2m-pod.c. I am however not sure how to prove there are no valid paths that could require a different lock ordering, am I expected to check all possible code paths for mm lock ordering? I think this solution introduces the minimum amount of modifications to fix the current issue, ie: paging caller being able to use copy to/from after taking subject mm locks. Thanks, Roger. ---8<--- diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h index 95295b62d2..a47beaf893 100644 --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -46,29 +46,36 @@ static inline int mm_locked_by_me(mm_lock_t *l) return (l->lock.recurse_cpu == current->processor); } +#define MM_LOCK_ORDER_MAX 64 + /* * If you see this crash, the numbers printed are order levels defined * in this file. */ -#define __check_lock_level(l) \ -do { \ - if ( unlikely(__get_lock_level() > (l)) ) \ - { \ - printk("mm locking order violation: %i > %i\n", \ - __get_lock_level(), (l)); \ - BUG(); \ - } \ +#define __check_lock_level(d, l) \ +do { \ + if ( unlikely(__get_lock_level() > \ + (l) + ((d) == current->domain->domain_id ? \ + MM_LOCK_ORDER_MAX : 0)) ) \ + { \ + printk("mm locking order violation: %i > %i\n", \ + __get_lock_level(), \ + (l) + ((d) == current->domain->domain_id ? \ + MM_LOCK_ORDER_MAX : 0)); \ + BUG(); \ + } \ } while(0) -#define __set_lock_level(l) \ -do { \ - __get_lock_level() = (l); \ +#define __set_lock_level(l) \ +do { \ + __get_lock_level() = (l); \ } while(0) -static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) +static inline void _mm_lock(domid_t d, mm_lock_t *l, const char *func, + int level, int rec) { - if ( !((mm_locked_by_me(l)) && rec) ) - __check_lock_level(level); + if ( !((mm_locked_by_me(l)) && rec) ) + __check_lock_level(d, level); spin_lock_recursive(&l->lock); if ( l->lock.recurse_cnt == 1 ) { @@ -77,16 +84,18 @@ static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) } else if ( (unlikely(!rec)) ) panic("mm lock already held by %s\n", l->locker_function); - __set_lock_level(level); + __set_lock_level(level + (d == current->domain->domain_id ? + MM_LOCK_ORDER_MAX : 0)); } -static inline void _mm_enforce_order_lock_pre(int level) +static inline void _mm_enforce_order_lock_pre(domid_t d, int level) { - __check_lock_level(level); + __check_lock_level(d, level); } -static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, - unsigned short *recurse_count) +static inline void _mm_enforce_order_lock_post(domid_t d, int level, + int *unlock_level, + unsigned short *recurse_count) { if ( recurse_count ) { @@ -97,7 +106,8 @@ static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, } else { *unlock_level = __get_lock_level(); } - __set_lock_level(level); + __set_lock_level(level + (d == current->domain->domain_id ? + MM_LOCK_ORDER_MAX : 0)); } @@ -114,16 +124,18 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l) return (l->locker == get_processor_id()); } -static inline void _mm_write_lock(mm_rwlock_t *l, const char *func, int level) +static inline void _mm_write_lock(domid_t d, mm_rwlock_t *l, const char *func, + int level) { if ( !mm_write_locked_by_me(l) ) { - __check_lock_level(level); + __check_lock_level(d, level); percpu_write_lock(p2m_percpu_rwlock, &l->lock); l->locker = get_processor_id(); l->locker_function = func; l->unlock_level = __get_lock_level(); - __set_lock_level(level); + __set_lock_level(level + (d == current->domain->domain_id ? + MM_LOCK_ORDER_MAX : 0)); } l->recurse_count++; } @@ -138,9 +150,9 @@ static inline void mm_write_unlock(mm_rwlock_t *l) percpu_write_unlock(p2m_percpu_rwlock, &l->lock); } -static inline void _mm_read_lock(mm_rwlock_t *l, int level) +static inline void _mm_read_lock(domid_t d, mm_rwlock_t *l, int level) { - __check_lock_level(level); + __check_lock_level(d, level); percpu_read_lock(p2m_percpu_rwlock, &l->lock); /* There's nowhere to store the per-CPU unlock level so we can't * set the lock level. */ @@ -153,28 +165,28 @@ static inline void mm_read_unlock(mm_rwlock_t *l) /* This wrapper uses the line number to express the locking order below */ #define declare_mm_lock(name) \ - static inline void mm_lock_##name(mm_lock_t *l, const char *func, int rec)\ - { _mm_lock(l, func, MM_LOCK_ORDER_##name, rec); } + static inline void mm_lock_##name(domid_t d, mm_lock_t *l, const char *func, int rec)\ + { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); } #define declare_mm_rwlock(name) \ - static inline void mm_write_lock_##name(mm_rwlock_t *l, const char *func) \ - { _mm_write_lock(l, func, MM_LOCK_ORDER_##name); } \ - static inline void mm_read_lock_##name(mm_rwlock_t *l) \ - { _mm_read_lock(l, MM_LOCK_ORDER_##name); } + static inline void mm_write_lock_##name(domid_t d, mm_rwlock_t *l, const char *func) \ + { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); } \ + static inline void mm_read_lock_##name(domid_t d, mm_rwlock_t *l) \ + { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); } /* These capture the name of the calling function */ -#define mm_lock(name, l) mm_lock_##name(l, __func__, 0) -#define mm_lock_recursive(name, l) mm_lock_##name(l, __func__, 1) -#define mm_write_lock(name, l) mm_write_lock_##name(l, __func__) -#define mm_read_lock(name, l) mm_read_lock_##name(l) +#define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0) +#define mm_lock_recursive(name, d, l) mm_lock_##name(d, l, __func__, 1) +#define mm_write_lock(name, d, l) mm_write_lock_##name(d, l, __func__) +#define mm_read_lock(name, d, l) mm_read_lock_##name(d, l) /* This wrapper is intended for "external" locks which do not use * the mm_lock_t types. Such locks inside the mm code are also subject * to ordering constraints. */ #define declare_mm_order_constraint(name) \ - static inline void mm_enforce_order_lock_pre_##name(void) \ - { _mm_enforce_order_lock_pre(MM_LOCK_ORDER_##name); } \ + static inline void mm_enforce_order_lock_pre_##name(domid_t d) \ + { _mm_enforce_order_lock_pre(d, MM_LOCK_ORDER_##name); } \ static inline void mm_enforce_order_lock_post_##name( \ - int *unlock_level, unsigned short *recurse_count) \ - { _mm_enforce_order_lock_post(MM_LOCK_ORDER_##name, unlock_level, recurse_count); } \ + domid_t d, int *unlock_level, unsigned short *recurse_count) \ + { _mm_enforce_order_lock_post(d, MM_LOCK_ORDER_##name, unlock_level, recurse_count); } \ static inline void mm_unlock(mm_lock_t *l) { @@ -186,8 +198,8 @@ static inline void mm_unlock(mm_lock_t *l) spin_unlock_recursive(&l->lock); } -static inline void mm_enforce_order_unlock(int unlock_level, - unsigned short *recurse_count) +static inline void mm_enforce_order_unlock(int unlock_level, + unsigned short *recurse_count) { if ( recurse_count ) { @@ -218,7 +230,7 @@ static inline void mm_enforce_order_unlock(int unlock_level, #define MM_LOCK_ORDER_nestedp2m 8 declare_mm_lock(nestedp2m) -#define nestedp2m_lock(d) mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock) +#define nestedp2m_lock(d) mm_lock(nestedp2m, (d)->domain_id, &(d)->arch.nested_p2m_lock) #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock) /* P2M lock (per-non-alt-p2m-table) @@ -257,9 +269,9 @@ declare_mm_rwlock(p2m); #define MM_LOCK_ORDER_per_page_sharing 24 declare_mm_order_constraint(per_page_sharing) -#define page_sharing_mm_pre_lock() mm_enforce_order_lock_pre_per_page_sharing() +#define page_sharing_mm_pre_lock() mm_enforce_order_lock_pre_per_page_sharing(DOMID_INVALID) #define page_sharing_mm_post_lock(l, r) \ - mm_enforce_order_lock_post_per_page_sharing((l), (r)) + mm_enforce_order_lock_post_per_page_sharing(DOMID_INVALID, (l), (r)) #define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r)) /* Alternate P2M list lock (per-domain) @@ -272,7 +284,8 @@ declare_mm_order_constraint(per_page_sharing) #define MM_LOCK_ORDER_altp2mlist 32 declare_mm_lock(altp2mlist) -#define altp2m_list_lock(d) mm_lock(altp2mlist, &(d)->arch.altp2m_list_lock) +#define altp2m_list_lock(d) mm_lock(altp2mlist, (d)->domain_id, \ + &(d)->arch.altp2m_list_lock) #define altp2m_list_unlock(d) mm_unlock(&(d)->arch.altp2m_list_lock) /* P2M lock (per-altp2m-table) @@ -290,9 +303,9 @@ declare_mm_rwlock(altp2m); #define p2m_lock(p) \ do { \ if ( p2m_is_altp2m(p) ) \ - mm_write_lock(altp2m, &(p)->lock); \ + mm_write_lock(altp2m, (p)->domain->domain_id, &(p)->lock); \ else \ - mm_write_lock(p2m, &(p)->lock); \ + mm_write_lock(p2m, (p)->domain->domain_id, &(p)->lock); \ (p)->defer_flush++; \ } while (0) #define p2m_unlock(p) \ @@ -304,7 +317,8 @@ declare_mm_rwlock(altp2m); } while (0) #define gfn_lock(p,g,o) p2m_lock(p) #define gfn_unlock(p,g,o) p2m_unlock(p) -#define p2m_read_lock(p) mm_read_lock(p2m, &(p)->lock) +#define p2m_read_lock(p) mm_read_lock(p2m, (p)->domain->domain_id,\ + &(p)->lock) #define p2m_read_unlock(p) mm_read_unlock(&(p)->lock) #define p2m_locked_by_me(p) mm_write_locked_by_me(&(p)->lock) #define gfn_locked_by_me(p,g) p2m_locked_by_me(p) @@ -316,7 +330,8 @@ declare_mm_rwlock(altp2m); #define MM_LOCK_ORDER_pod 48 declare_mm_lock(pod) -#define pod_lock(p) mm_lock(pod, &(p)->pod.lock) +#define pod_lock(p) mm_lock(pod, (p)->domain->domain_id, \ + &(p)->pod.lock) #define pod_unlock(p) mm_unlock(&(p)->pod.lock) #define pod_locked_by_me(p) mm_locked_by_me(&(p)->pod.lock) @@ -329,9 +344,9 @@ declare_mm_lock(pod) #define MM_LOCK_ORDER_page_alloc 56 declare_mm_order_constraint(page_alloc) -#define page_alloc_mm_pre_lock() mm_enforce_order_lock_pre_page_alloc() -#define page_alloc_mm_post_lock(l) mm_enforce_order_lock_post_page_alloc(&(l), NULL) -#define page_alloc_mm_unlock(l) mm_enforce_order_unlock((l), NULL) +#define page_alloc_mm_pre_lock(d) mm_enforce_order_lock_pre_page_alloc((d)->domain_id) +#define page_alloc_mm_post_lock(d, l) mm_enforce_order_lock_post_page_alloc((d)->domain_id, &(l), NULL) +#define page_alloc_mm_unlock(d, l) mm_enforce_order_unlock((l), NULL) /* Paging lock (per-domain) * @@ -350,9 +365,10 @@ declare_mm_order_constraint(page_alloc) #define MM_LOCK_ORDER_paging 64 declare_mm_lock(paging) -#define paging_lock(d) mm_lock(paging, &(d)->arch.paging.lock) -#define paging_lock_recursive(d) \ - mm_lock_recursive(paging, &(d)->arch.paging.lock) +#define paging_lock(d) mm_lock(paging, (d)->domain_id,\ + &(d)->arch.paging.lock) +#define paging_lock_recursive(d) mm_lock_recursive(paging, (d)->domain_id, \ + &(d)->arch.paging.lock) #define paging_unlock(d) mm_unlock(&(d)->arch.paging.lock) #define paging_locked_by_me(d) mm_locked_by_me(&(d)->arch.paging.lock) diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c index 4c56cb58c6..7cac396187 100644 --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -34,14 +34,16 @@ /* Enforce lock ordering when grabbing the "external" page_alloc lock */ static inline void lock_page_alloc(struct p2m_domain *p2m) { - page_alloc_mm_pre_lock(); + page_alloc_mm_pre_lock(p2m->domain); spin_lock(&(p2m->domain->page_alloc_lock)); - page_alloc_mm_post_lock(p2m->domain->arch.page_alloc_unlock_level); + page_alloc_mm_post_lock(p2m->domain, + p2m->domain->arch.page_alloc_unlock_level); } static inline void unlock_page_alloc(struct p2m_domain *p2m) { - page_alloc_mm_unlock(p2m->domain->arch.page_alloc_unlock_level); + page_alloc_mm_unlock(p2m->domain, + p2m->domain->arch.page_alloc_unlock_level); spin_unlock(&(p2m->domain->page_alloc_lock)); } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.