[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter
On 01/02/2021 13:18, Jan Beulich wrote: > On 30.01.2021 03:58, Andrew Cooper wrote: >> +static int vmtrace_alloc_buffer(struct vcpu *v) >> +{ >> + struct domain *d = v->domain; >> + struct page_info *pg; >> + unsigned int i; >> + >> + if ( !d->vmtrace_size ) >> + return 0; >> + >> + pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size), >> + MEMF_no_refcount); >> + if ( !pg ) >> + return -ENOMEM; >> + >> + /* >> + * Getting the reference counting correct here is hard. >> + * >> + * All pages are now on the domlist. They, or subranges within, will be > "domlist" is too imprecise, as there's no list with this name. It's > extra_page_list in this case (see also below). > >> + * freed when their reference count drops to zero, which may any time >> + * between now and the domain teardown path. >> + */ >> + >> + for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ ) >> + if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) ) >> + goto refcnt_err; >> + >> + /* >> + * We must only let vmtrace_free_buffer() take any action in the success >> + * case when we've taken all the refs it intends to drop. >> + */ >> + v->vmtrace.pg = pg; >> + >> + return 0; >> + >> + refcnt_err: >> + /* >> + * In the failure case, we must drop all the acquired typerefs thus far, >> + * skip vmtrace_free_buffer(), and leave domain_relinquish_resources() >> to >> + * drop the alloc refs on any remaining pages - some pages could already >> + * have been freed behind our backs. >> + */ >> + while ( i-- ) >> + put_page_and_type(&pg[i]); >> + >> + return -ENODATA; >> +} > As said in reply on the other thread, PGC_extra pages don't get > freed automatically. I too initially thought they would, but > (re-)learned otherwise when trying to repro your claims on that > other thread. For all pages you've managed to get the writable > ref, freeing is easily done by prefixing the loop body above by > put_page_alloc_ref(). For all other pages best you can do (I > think; see the debugging patches I had sent on that other > thread) is to try get_page() - if it succeeds, calling > put_page_alloc_ref() is allowed. Otherwise we can only leak the > respective page (unless going to further extents with trying to > recover from the "impossible"), or assume the failure here was > because it did get freed already. Right - I'm going to insist on breaking apart orthogonal issues. This refcounting issue isn't introduced by this series - this series uses an established pattern, in which we've found a corner case. The corner case is theoretical, not practical - it is not possible for a malicious PV domain to take 2^43 refs on any of the pages in this allocation. Doing so would require an hours-long SMI, or equivalent, and even then all malicious activity would be paused after 1s for the time calibration rendezvous which would livelock the system until the watchdog kicked in. I will drop the comments, because in light of this discovery, they're not correct. We should fix the corner case, but that should be a separate patch. Whatever we do needs to start by writing down the the refcounting rules first because its totally clear that noone understands them, and then a change to all examples of this pattern adjusting (if necessary). ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |