[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 1/2] x86/mem_sharing: reorder when pages are unlocked and released
> On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@xxxxxxxxxxxxx> wrote: > > On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@xxxxxxxxxx> > wrote: >> >> On 4/12/19 8:47 PM, Tamas K Lengyel wrote: >>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" >>> introduced >>> grabbing extra references for pages that drop references tied to >>> PGC_allocated. >>> However, the way these extra references were grabbed were incorrect, >>> resulting >>> in both share_pages and unshare_pages failing. >> >> Why? Is cd not the owner of the page? > > It's not, dom_cow is. > >>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, >>> shr_handle_t sh, >>> /* Free the client page */ >>> if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) >>> put_page(cpage); >>> - put_page(cpage); >>> + >>> + ASSERT(put_count); >>> + while ( put_count-- ) >>> + put_page_and_type(cpage); >> >> I don't think this is in the spirit of the fix in 0502e0adae2; the idea >> there seems to have been to make sure that cpage belongs to the right >> domain, and that the ownership doesn't change. If such a race / mistake >> were possible before that change, such a race / mistake would be >> possible after this change, wouldn't it? > > I don't have an answer to your question. AFAIU this is just to ensure > that the page isn't dropped before test_and_clear_bit is used on the > page. Right, but why would that be a bad thing? I think it’s to avoid races like this: P1: Gets pointer to page A, owned by domain M P2: Gets pointer to page A, owned by domain M P2: test_and_clear_bit(PGC_allocated) succeeds P2: put_page() -> page freed -> Page re-allocated to domain N P1: test_and_clear_bit(PGC_allocated) succeeds P1: put_page() -> page freed ## Now P1 incorrectly “frees” a page owned by domain N. If both P1 and P2 call get_page(A, M) first, and drop that reference afterwards, this race can’t happen. Jan, you’re the author of that patch — is that not the case? Could a similar race to the above occur in this code? It looks like it could have before 0502e0adae2 for sure, since all references and locks were dropped. It’s not clear to me whether mem_sharing_page_lock() will protect against ownership changes. *If* that assertion you added is always true, that there is always at least one reference in the rmap that will be held; *and* those references are always “held” on behalf of dom_cow, then I think your version will probably be safe (as the reference in the rmap will prevent it from changing ownership until the final put_page_and_type()). But if #2 ever false for any reason (even due to a bug in Xen), then this will open up a potential security issue (since ASSERT()s don’t trigger in production code). A couple of options: * Make it a BUG_ON(); that would change a potential privilege escalation into a DoS * Only clear PGC_allocated if put_count > 0. This changes a potential privilege escalation into a slow memory leak * If put_count == 0, grab an extra reference and drop it afterwards (with an ASSERT_UNREACHABLE() to help catch errors early). This avoids any XSA, but makes the code a bit uglier. * Leave the code as it is; just change the get_page() to grabbing dom_cow rather than cd. This avoids any XSA, but adds an extra get / put page which will be pure overhead in the common case. Thoughts? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |