[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 1 Feb 2021 14:22:56 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=l31qFCme4UtzWak68FnfvS7fS+QbUGHWlTKm9xQxhTQ=; b=aUDuros7o7/k0aMyCqiVPSX20701Cq1c7UVGEheuN2P0w5Ifgwe+PjSvTGq7kj64RT+dqdc9c5fMqMJdcyMN+/5qfFmAfULWqHfa0pNpybrZOdHjvoWu8wQfaawvdW1jaeJeZxph1LpGWjWXd2tXGUjhZTI74vClqMToxEu9GDmP/SIIDoh70pEAb4SQ/zJafl6BQtPAl8ElQCvpH/gYWpCt/syfM8r4kXAwoqzWcskpP9gPL1dPIYRUaNaf+ESeFIoPhcwDiKUoclp0Bcsb+91U1i6vmqJSRxRrVDGGKQwCdHEC/HB55oQYRaJAWv7QlFclLtnFI+HjnhxJr8HZZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S18wDtbOLO5SGredUbOa6zBBEMXj9lNRp38YtHkiWziXWUweW/Ml7oAzdb40KkJY8dPTznzVN0Dqy47utpSe9LGwxC9mL0Ikhd5gymonQCZcP1YJBIjQZdCz6Um8D4Vyrl3m+2z+4g0yKmddyi9XfFJibpTwWYaBAxc1zQ8/KqR8asP/mjIrfEN7FQFuuJYeVWMylDtplblHLCGsM3s9SpwMP4nzKehJSibgFBa6nu5BM8dIT1DNYKMWFp8nXPznXREbsnoOG6PE5RkH7WX8pOA0IH+/cUtO5eHGeXXOYLS2Hbqn8LIUAQEIv8lleTYNlqYmcyc1ZuPtPbA2WN897Q==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Michał Leszczyński <michal.leszczynski@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 01 Feb 2021 14:23:15 +0000
  • Ironport-sdr: ayd3akQopjyTRp2jIkTkijPgzDMHwMqP08SzGBEH1CDw/hybYIVXiPqdEfO2z6IdsZSr+KzzdE 2BZYHc8K16VZ9o3xdMNJgOVsjbkqNBy7eOqZr7Ol8Ht5pjdentpKMCa1O0yx5nsnZGh1Bcm5im rbbIpLCAEiU2S/7foBsQz5M6NLU6p1NyHFHXgIgPc0p4QSzJ0G8cYEsO2dhqIR5QDIDZmq5By1 DUtFbrC0rKT6NfZ+viV++L7ts15JCjYEACgkPOPflMTy6TAZElOCsSZOUNpzzmAmnl27Q/wRZI Tc0=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01/02/2021 13:18, Jan Beulich wrote:
> On 30.01.2021 03:58, Andrew Cooper wrote:
>> +static int vmtrace_alloc_buffer(struct vcpu *v)
>> +{
>> +    struct domain *d = v->domain;
>> +    struct page_info *pg;
>> +    unsigned int i;
>> +
>> +    if ( !d->vmtrace_size )
>> +        return 0;
>> +
>> +    pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size),
>> +                             MEMF_no_refcount);
>> +    if ( !pg )
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * Getting the reference counting correct here is hard.
>> +     *
>> +     * All pages are now on the domlist.  They, or subranges within, will be
> "domlist" is too imprecise, as there's no list with this name. It's
> extra_page_list in this case (see also below).
>
>> +     * freed when their reference count drops to zero, which may any time
>> +     * between now and the domain teardown path.
>> +     */
>> +
>> +    for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ )
>> +        if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) )
>> +            goto refcnt_err;
>> +
>> +    /*
>> +     * We must only let vmtrace_free_buffer() take any action in the success
>> +     * case when we've taken all the refs it intends to drop.
>> +     */
>> +    v->vmtrace.pg = pg;
>> +
>> +    return 0;
>> +
>> + refcnt_err:
>> +    /*
>> +     * In the failure case, we must drop all the acquired typerefs thus far,
>> +     * skip vmtrace_free_buffer(), and leave domain_relinquish_resources() 
>> to
>> +     * drop the alloc refs on any remaining pages - some pages could already
>> +     * have been freed behind our backs.
>> +     */
>> +    while ( i-- )
>> +        put_page_and_type(&pg[i]);
>> +
>> +    return -ENODATA;
>> +}
> As said in reply on the other thread, PGC_extra pages don't get
> freed automatically. I too initially thought they would, but
> (re-)learned otherwise when trying to repro your claims on that
> other thread. For all pages you've managed to get the writable
> ref, freeing is easily done by prefixing the loop body above by
> put_page_alloc_ref(). For all other pages best you can do (I
> think; see the debugging patches I had sent on that other
> thread) is to try get_page() - if it succeeds, calling
> put_page_alloc_ref() is allowed. Otherwise we can only leak the
> respective page (unless going to further extents with trying to
> recover from the "impossible"), or assume the failure here was
> because it did get freed already.

Right - I'm going to insist on breaking apart orthogonal issues.

This refcounting issue isn't introduced by this series - this series
uses an established pattern, in which we've found a corner case.

The corner case is theoretical, not practical - it is not possible for a
malicious PV domain to take 2^43 refs on any of the pages in this
allocation.  Doing so would require an hours-long SMI, or equivalent,
and even then all malicious activity would be paused after 1s for the
time calibration rendezvous which would livelock the system until the
watchdog kicked in.


I will drop the comments, because in light of this discovery, they're
not correct.

We should fix the corner case, but that should be a separate patch. 
Whatever we do needs to start by writing down the the refcounting rules
first because its totally clear that noone understands them, and then a
change to all examples of this pattern adjusting (if necessary).

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.