[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV: assert page state in mark_pv_pt_pages_rdonly()


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 17 Aug 2021 10:54:32 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2LfvdRzTfmYiHVI6p1QEQSUPiPwNPLwto82B1ifRIjk=; b=T9efgc7pIKujChK/mV/t/XPBHMJRK3iRVN/TIvKV8mSp+g6FzBVSfhy91GLm2crNqM37fFDFAIl9Wn/2owEu6SbtMdjxCJC4UvmAqNmIdbN6jYuU+UsC8c529oHLEKk9jCMaTtiTXUWMxZMP+T8aDRmnbAFpEQlkHiNGY+OLWVtrRKe5d0zVj5aQ9PrZpWmDou8G59d4NMJ2CxsU7vkgQLxCdC0C4CcK1TSxB/30RJQKhBee8JJ3/gWkgtsZ8kvT0MBxzXM9z01/DXWdpvaudO8Ll+UyoUs9Sh4YRZvaGf/abzBo+3Bmc7y64dqaMjqwJsusFgBKelrBcd5S+eAFaA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HF78aJywQync96xqZfNNR3pCDm1tnOg1wNcCIIZkAx6AxY3gMGVaWHf0zwofx5fsRapeXjUfAvo1EmwTG8A55i+xHP1yGtMYVxrEu54ay08nauyOu8SL5vClw42uhszygmIS66DbE/H7G/4smE3LBF7F3mDjzM9xRv6H4tpBflyYdaIRGOcPSk2tOPKWhoOmPkt3PvP/VQQNKdxrH1oN8O+UT3a/h+PJMQwfyT7wO33Kj02IDSXcSh5xO4FAwS8EE5b+7K/4KkdTpWflq4N+00FK9eP55ps6OBKWZnec41gYfVm7m4nxVVhXeB3xTWjUAh9EPIyspEWL10IBrFMxVw==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 17 Aug 2021 08:54:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.08.2021 21:25, Andrew Cooper wrote:
> On 16/08/2021 16:29, Jan Beulich wrote:
>> About every time I look at dom0_construct_pv()'s "calculation" of
>> nr_pt_pages I question (myself) whether the result is precise or merely
>> an upper bound. I think it is meant to be precise, but I think we would
>> be better off having some checking in place. Hence add ASSERT()s to
>> verify that
>> - all pages have a valid L1...Ln (currently L4) page table type and
>> - no other bits are set, in particular the type refcount is still zero.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> There are (at least) two factors supporting my uncertainty about the
>> "calculation" being precise: The loop starting from 2 (which clearly is
>> too small for a possible result)
> 
> 2 was the correct absolute minimum for 2-level guests.

Which has been history for how many years? The minimum for the
current implementation is 4 afaict, and ...

> XTF kernels don't exceed the 2M boundary (at least, not currently), so
> they can be mapped with only 3 or 4 pagetables, except:
> 
> * 3-level guests are created with 4 L2's for no obvious reason.  This is
> nothing to do with legacy PAE paging, nor with how a typical Linux/BSD
> kernel works.  The requirement to make 3-level guests work (and even
> then, only under 32bit Xen) is to create a PGT_pae_xen_l2 if not already
> covered by the other mappings.  Any non-toy kernel discards these
> pagetables in favour of its own idea of pagetables.

... could be 3 for 32-bit Dom0.

> * v_end is rounded up to 4MB.
> 
> Most XTF guests will operate entirely happily in a few hundred kb of
> space, and the same will be true of other microservices.  The rounding
> up of memory might be helpful for the traditional big VMs case, but it
> isn't correct or useful for other usecases.
> 
>> and an apparently wrong comment stating
>> that not only v_end but also v_start would be superpage aligned
> 
> Which comment?  The only one I see about 4M has nothing to do with
> superpages.

The one immediately ahead of the related variable declarations:

    /*
     * This fully describes the memory layout of the initial domain. All
     * *_start address are page-aligned, except v_start (and v_end) which are
     * superpage-aligned.
     */

I see nothing forcing v_start to be superpage-aligned, while I
do suspect that the "calculation" of the number of page tables
will be wrong when it isn't.

>>  (in fact
>> v_end is 4MiB aligned, which is the superpage size only on long
>> abandoned [by us] non-PAE x86-32).
> 
> Tangentially, that code needs some serious work to use ROUNDUP/DOWN
> macros for clarity.

Agreed.

>> --- a/xen/arch/x86/pv/dom0_build.c
>> +++ b/xen/arch/x86/pv/dom0_build.c
>> @@ -59,6 +59,10 @@ static __init void mark_pv_pt_pages_rdon
>>          l1e_remove_flags(*pl1e, _PAGE_RW);
>>          page = mfn_to_page(l1e_get_mfn(*pl1e));
>>  
>> +        ASSERT(page->u.inuse.type_info & PGT_type_mask);
>> +        ASSERT((page->u.inuse.type_info & PGT_type_mask) <= 
>> PGT_root_page_table);
> 
> This is an obfuscated
> 
> ASSERT((page->u.inuse.type_info & PGT_type_mask) >= PGT_l1_page_table &&
>        (page->u.inuse.type_info & PGT_type_mask) <= PGT_root_page_table);

I can certainly switch to this yet longer piece of code, and ...

> and
> 
>> +        ASSERT(!(page->u.inuse.type_info & ~PGT_type_mask));
> 
> this has no context.
> 
> At a bare minimum, you need a comment stating what properties we're
> looking for, so anyone suffering an assertion failure has some clue as
> to what may have gone wrong.

... I can certainly transform the respective parts of the
description into a code comment.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.