[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] [xen staging-4.9] x86/mm: Always retain a general ref on partial
commit d3c4b609fafdf440ea159c13f2633693d320f39b Author: George Dunlap <george.dunlap@xxxxxxxxxx> AuthorDate: Mon Nov 4 15:06:12 2019 +0100 Commit: Jan Beulich <jbeulich@xxxxxxxx> CommitDate: Mon Nov 4 15:06:12 2019 +0100 x86/mm: Always retain a general ref on partial In order to allow recursive pagetable promotions and demotions to be interrupted, Xen must keep track of the state of the sub-pages promoted or demoted. This is stored in two elements in the page struct: nr_entries_validated and partial_flags. The rule is that entries [0, nr_entries_validated) should always be validated and hold a general reference count. If partial_flags is zero, then [nr_entries_validated] is not validated and no reference count is held. If PTF_partial_set is set, then [nr_entries_validated] is partially validated. At the moment, a distinction is made between promotion and demotion with regard to whether the entry itself "holds" a general reference count: when entry promotion is interrupted (i.e., returns -ERESTART), the entry is not considered to hold a reference; when entry demotion is interrupted, the entry is still considered to hold a general reference. PTF_partial_general_ref is used to distinguish between these cases. If clear, it's a partial promotion => no general reference count held by the entry; if set, it's partial demotion, so a general reference count held. Because promotions and demotions can be interleaved, this value is passed to get_page_and_type_from_mfn and put_page_from_l*e, to be able to properly handle reference counts. Unfortunately, because a refcount is not held, it is possible to engineer a situation where PFT_partial_set is set but the page in question has been assigned to another domain. A sketch is provided in the appendix. Fix this by having the parent page table entry hold a general reference count whenever PFT_partial_set is set. (For clarity of change, keep two separate flags. These will be collapsed in a subsequent changeset.) This has two basic implications. On the put_page_from_lNe() side, this mean that the (partial_set && !partial_ref) case can never happen, and no longer needs to be special-cased. Secondly, because both flags are set together, there's no need to carry over existing bits from partial_pte. (NB there is still another issue with calling _put_page_type() on a page which had PGT_partial set; that will be handled in a subsequent patch.) On the get_page_and_type_from_mfn() side, we need to distinguish between callers which hold a reference on partial (i.e., alloc_lN_table()), and those which do not (new_cr3, PIN_LN_TABLE, and so on): pass a flag if the type should be retained on interruption. NB that since l1 promotion can't be preempted, that get_page_from_l2e can't return -ERESTART. This is part of XSA-299. Reported-by: George Dunlap <george.dunlap@xxxxxxxxxx> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> ----- * Appendix: Engineering PTF_partial_set while a page belongs to a foreign domain Suppose A is a page which can be promoted to an l3, and B is a page which can be promoted to an l2, and A[x] points to B. B has PGC_allocated set but no other general references. V1: PIN_L3 A. A is validated, B is validated. A.type_count = 1 | PGT_validated | PGT_pinned B.type_count = 1 | PGT_validated B.count = 2 | PGC_allocated (A[x] holds a general ref) V1: UNPIN A. A begins de-validation. Arrange to be interrupted when i < x V1->old_guest_table = A V1->old_guest_table_ref_held = false A.type_count = 1 | PGT_partial A.nr_validated_entries = i < x B.type_count = 0 B.count = 1 | PGC_allocated V2: MOD_L4_ENTRY to point some l4e to A. Picks up re-validation of A. Arrange to be interrupted halfway through B's validation B.type_count = 1 | PGT_partial B.count = 2 | PGC_allocated (PGT_partial holds a general ref) A.type_count = 1 | PGT_partial A.nr_validated_entries = x A.partial_pte = PTF_partial_set V3: MOD_L3_ENTRY to point some other l3e (not in A) to B. Validates B. B.type_count = 1 | PGT_validated B.count = 2 | PGC_allocated ("other l3e" holds a general ref) V3: MOD_L3_ENTRY to clear l3e pointing to B. Devalidates B. B.type_count = 0 B.count = 1 | PGC_allocated V3: decrease_reservation(B) Clears PGC_allocated B.count = 0 => B is freed B gets assigned to a different domain V1: Restarts UNPIN of A put_old_guest_table(A) ... free_l3_table(A) Now since A.partial_flags has PTF_partial_set, free_l3_table() will call put_page_from_l3e() on A[x], which points to B, while B is owned by another domain. If A[x] held a general refcount for B on partial validation, as it does for partial de-validation, then B would still have a reference count of 1 after PGC_allocated was freed; so B wouldn't be freed until after put_page_from_l3e() had happend on A[x]. master commit: 18b0ab697830a46ce3dacaf9210799322cb3732c master date: 2019-10-31 16:14:36 +0100 --- xen/arch/x86/mm.c | 87 +++++++++++++++++++++++++++++------------------- xen/include/asm-x86/mm.h | 15 +++++---- 2 files changed, 61 insertions(+), 41 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index fce15f30ca..0ac8c4592d 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -777,10 +777,11 @@ static int __get_page_type(struct page_info *page, unsigned long type, * page->pte[page->nr_validated_entries]. See the comment in mm.h for * more information. */ -#define PTF_partial_set (1 << 0) -#define PTF_partial_general_ref (1 << 1) -#define PTF_preemptible (1 << 2) -#define PTF_defer (1 << 3) +#define PTF_partial_set (1 << 0) +#define PTF_partial_general_ref (1 << 1) +#define PTF_preemptible (1 << 2) +#define PTF_defer (1 << 3) +#define PTF_retain_ref_on_restart (1 << 4) static int get_page_and_type_from_pagenr(unsigned long page_nr, unsigned long type, @@ -790,7 +791,11 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr, struct page_info *page = mfn_to_page(page_nr); int rc; bool preemptible = flags & PTF_preemptible, - partial_ref = flags & PTF_partial_general_ref; + partial_ref = flags & PTF_partial_general_ref, + partial_set = flags & PTF_partial_set, + retain_ref = flags & PTF_retain_ref_on_restart; + + ASSERT(partial_ref == partial_set); if ( likely(!partial_ref) && unlikely(!get_page_from_pagenr(page_nr, d)) ) @@ -803,13 +808,15 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr, * - page is fully validated (rc == 0) * - page is not validated (rc < 0) but: * - We came in with a reference (partial_ref) + * - page is partially validated (rc == -ERESTART), and the + * caller has asked the ref to be retained in that case * - page is partially validated but there's been an error * (page == current->arch.old_guest_table) * * The partial_ref-on-error clause is worth an explanation. There * are two scenarios where partial_ref might be true coming in: - * - mfn has been partially demoted as type `type`; i.e. has - * PGT_partial set + * - mfn has been partially promoted / demoted as type `type`; + * i.e. has PGT_partial set * - mfn has been partially demoted as L(type+1) (i.e., a linear * page; e.g. we're being called from get_page_from_l2e with * type == PGT_l1_table, but the mfn is PGT_l2_table) @@ -832,7 +839,8 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr, */ if ( likely(!rc) || partial_ref ) /* nothing */; - else if ( page == current->arch.old_guest_table ) + else if ( page == current->arch.old_guest_table || + (retain_ref && rc == -ERESTART) ) ASSERT(preemptible); else put_page(page); @@ -1535,8 +1543,8 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) == PTF_partial_set ) { - ASSERT(!(flags & PTF_defer)); - rc = _put_page_type(pg, PTF_preemptible, ptpg); + /* partial_set should always imply partial_ref */ + BUG(); } else if ( flags & PTF_defer ) { @@ -1581,8 +1589,8 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) == PTF_partial_set ) { - ASSERT(!(flags & PTF_defer)); - return _put_page_type(pg, PTF_preemptible, mfn_to_page(pfn)); + /* partial_set should always imply partial_ref */ + BUG(); } if ( flags & PTF_defer ) @@ -1612,8 +1620,8 @@ static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) == PTF_partial_set ) { - ASSERT(!(flags & PTF_defer)); - return _put_page_type(pg, PTF_preemptible, mfn_to_page(pfn)); + /* partial_set should always imply partial_ref */ + BUG(); } if ( flags & PTF_defer ) @@ -1740,13 +1748,22 @@ static int alloc_l2_table(struct page_info *page, unsigned long type) (rc = get_page_from_l2e(pl2e[i], pfn, d, partial_flags)) > 0 ) continue; - if ( rc == -ERESTART ) - { - page->nr_validated_ptes = i; - /* Set 'set', retain 'general ref' */ - page->partial_flags = partial_flags | PTF_partial_set; - } - else if ( rc == -EINTR && i ) + /* + * It shouldn't be possible for get_page_from_l2e to return + * -ERESTART, since we never call this with PTF_preemptible. + * (alloc_l1_table may return -EINTR on an L1TF-vulnerable + * entry.) + * + * NB that while on a "clean" promotion, we can never get + * PGT_partial. It is possible to arrange for an l2e to + * contain a partially-devalidated l2; but in that case, both + * of the following functions will fail anyway (the first + * because the page in question is not an l1; the second + * because the page is not fully validated). + */ + ASSERT(rc != -ERESTART); + + if ( rc == -EINTR && i ) { page->nr_validated_ptes = i; page->partial_flags = 0; @@ -1755,6 +1772,7 @@ static int alloc_l2_table(struct page_info *page, unsigned long type) else if ( rc < 0 && rc != -EINTR ) { gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i); + ASSERT(current->arch.old_guest_table == NULL); if ( i ) { page->nr_validated_ptes = i; @@ -1818,17 +1836,21 @@ static int alloc_l3_table(struct page_info *page) PGT_l2_page_table | PGT_pae_xen_l2, d, - partial_flags | PTF_preemptible); + partial_flags | + PTF_preemptible | + PTF_retain_ref_on_restart); } else if ( !is_guest_l3_slot(i) || - (rc = get_page_from_l3e(pl3e[i], pfn, d, partial_flags)) > 0 ) + (rc = get_page_from_l3e(pl3e[i], pfn, d, + partial_flags | + PTF_retain_ref_on_restart)) > 0 ) continue; if ( rc == -ERESTART ) { page->nr_validated_ptes = i; /* Set 'set', leave 'general ref' set if this entry was set */ - page->partial_flags = partial_flags | PTF_partial_set; + page->partial_flags = PTF_partial_set | PTF_partial_general_ref; } else if ( rc == -EINTR && i ) { @@ -1923,14 +1945,15 @@ static int alloc_l4_table(struct page_info *page) i++, partial_flags = 0 ) { if ( !is_guest_l4_slot(d, i) || - (rc = get_page_from_l4e(pl4e[i], pfn, d, partial_flags)) > 0 ) + (rc = get_page_from_l4e(pl4e[i], pfn, d, + partial_flags | PTF_retain_ref_on_restart)) > 0 ) continue; if ( rc == -ERESTART ) { page->nr_validated_ptes = i; /* Set 'set', leave 'general ref' set if this entry was set */ - page->partial_flags = partial_flags | PTF_partial_set; + page->partial_flags = PTF_partial_set | PTF_partial_general_ref; } else if ( rc < 0 ) { @@ -2029,9 +2052,7 @@ static int free_l2_table(struct page_info *page) else if ( rc == -ERESTART ) { page->nr_validated_ptes = i; - page->partial_flags = (partial_flags & PTF_partial_set) ? - partial_flags : - (PTF_partial_set | PTF_partial_general_ref); + page->partial_flags = PTF_partial_set | PTF_partial_general_ref; } else if ( rc == -EINTR && i < L2_PAGETABLE_ENTRIES - 1 ) { @@ -2082,9 +2103,7 @@ static int free_l3_table(struct page_info *page) if ( rc == -ERESTART ) { page->nr_validated_ptes = i; - page->partial_flags = (partial_flags & PTF_partial_set) ? - partial_flags : - (PTF_partial_set | PTF_partial_general_ref); + page->partial_flags = PTF_partial_set | PTF_partial_general_ref; } else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 ) { @@ -2115,9 +2134,7 @@ static int free_l4_table(struct page_info *page) if ( rc == -ERESTART ) { page->nr_validated_ptes = i; - page->partial_flags = (partial_flags & PTF_partial_set) ? - partial_flags : - (PTF_partial_set | PTF_partial_general_ref); + page->partial_flags = PTF_partial_set | PTF_partial_general_ref; } else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 ) { diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 0bf5b60ba8..6ab642eb18 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -150,22 +150,25 @@ struct page_info * page. * * This happens: - * - During de-validation, if de-validation of the page was + * - During validation or de-validation, if the operation was * interrupted * - During validation, if an invalid entry is encountered and * validation is preemptible * - During validation, if PTF_partial_general_ref was set on - * this entry to begin with (perhaps because we're picking - * up from a partial de-validation). + * this entry to begin with (perhaps because it picked up a + * previous operation) * - * When resuming validation, if PTF_partial_general_ref is clear, - * then a general reference must be re-acquired; if it is set, no - * reference should be acquired. + * When resuming validation, if PTF_partial_general_ref is + * clear, then a general reference must be re-acquired; if it + * is set, no reference should be acquired. * * When resuming de-validation, if PTF_partial_general_ref is * clear, no reference should be dropped; if it is set, a * reference should be dropped. * + * NB at the moment, PTF_partial_set should be set if and only if + * PTF_partial_general_ref is set. + * * NB that PTF_partial_set and PTF_partial_general_ref are * defined in mm.c, the only place where they are used. * -- generated by git-patchbot for /home/xen/git/xen.git#staging-4.9 _______________________________________________ Xen-changelog mailing list Xen-changelog@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/xen-changelog
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |