Xen project Mailing List

Re: [PATCH] x86/PV: assert page state in mark_pv_pt_pages_rdonly()

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 17 Aug 2021 10:54:32 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2LfvdRzTfmYiHVI6p1QEQSUPiPwNPLwto82B1ifRIjk=; b=T9efgc7pIKujChK/mV/t/XPBHMJRK3iRVN/TIvKV8mSp+g6FzBVSfhy91GLm2crNqM37fFDFAIl9Wn/2owEu6SbtMdjxCJC4UvmAqNmIdbN6jYuU+UsC8c529oHLEKk9jCMaTtiTXUWMxZMP+T8aDRmnbAFpEQlkHiNGY+OLWVtrRKe5d0zVj5aQ9PrZpWmDou8G59d4NMJ2CxsU7vkgQLxCdC0C4CcK1TSxB/30RJQKhBee8JJ3/gWkgtsZ8kvT0MBxzXM9z01/DXWdpvaudO8Ll+UyoUs9Sh4YRZvaGf/abzBo+3Bmc7y64dqaMjqwJsusFgBKelrBcd5S+eAFaA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HF78aJywQync96xqZfNNR3pCDm1tnOg1wNcCIIZkAx6AxY3gMGVaWHf0zwofx5fsRapeXjUfAvo1EmwTG8A55i+xHP1yGtMYVxrEu54ay08nauyOu8SL5vClw42uhszygmIS66DbE/H7G/4smE3LBF7F3mDjzM9xRv6H4tpBflyYdaIRGOcPSk2tOPKWhoOmPkt3PvP/VQQNKdxrH1oN8O+UT3a/h+PJMQwfyT7wO33Kj02IDSXcSh5xO4FAwS8EE5b+7K/4KkdTpWflq4N+00FK9eP55ps6OBKWZnec41gYfVm7m4nxVVhXeB3xTWjUAh9EPIyspEWL10IBrFMxVw==

Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;

Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 17 Aug 2021 08:54:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.08.2021 21:25, Andrew Cooper wrote: > On 16/08/2021 16:29, Jan Beulich wrote: >> About every time I look at dom0_construct_pv()'s "calculation" of >> nr_pt_pages I question (myself) whether the result is precise or merely >> an upper bound. I think it is meant to be precise, but I think we would >> be better off having some checking in place. Hence add ASSERT()s to >> verify that >> - all pages have a valid L1...Ln (currently L4) page table type and >> - no other bits are set, in particular the type refcount is still zero. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> There are (at least) two factors supporting my uncertainty about the >> "calculation" being precise: The loop starting from 2 (which clearly is >> too small for a possible result) > > 2 was the correct absolute minimum for 2-level guests. Which has been history for how many years? The minimum for the current implementation is 4 afaict, and ... > XTF kernels don't exceed the 2M boundary (at least, not currently), so > they can be mapped with only 3 or 4 pagetables, except: > > * 3-level guests are created with 4 L2's for no obvious reason. This is > nothing to do with legacy PAE paging, nor with how a typical Linux/BSD > kernel works. The requirement to make 3-level guests work (and even > then, only under 32bit Xen) is to create a PGT_pae_xen_l2 if not already > covered by the other mappings. Any non-toy kernel discards these > pagetables in favour of its own idea of pagetables. ... could be 3 for 32-bit Dom0. > * v_end is rounded up to 4MB. > > Most XTF guests will operate entirely happily in a few hundred kb of > space, and the same will be true of other microservices. The rounding > up of memory might be helpful for the traditional big VMs case, but it > isn't correct or useful for other usecases. > >> and an apparently wrong comment stating >> that not only v_end but also v_start would be superpage aligned > > Which comment? The only one I see about 4M has nothing to do with > superpages. The one immediately ahead of the related variable declarations: /* * This fully describes the memory layout of the initial domain. All * *_start address are page-aligned, except v_start (and v_end) which are * superpage-aligned. */ I see nothing forcing v_start to be superpage-aligned, while I do suspect that the "calculation" of the number of page tables will be wrong when it isn't. >> (in fact >> v_end is 4MiB aligned, which is the superpage size only on long >> abandoned [by us] non-PAE x86-32). > > Tangentially, that code needs some serious work to use ROUNDUP/DOWN > macros for clarity. Agreed. >> --- a/xen/arch/x86/pv/dom0_build.c >> +++ b/xen/arch/x86/pv/dom0_build.c >> @@ -59,6 +59,10 @@ static __init void mark_pv_pt_pages_rdon >> l1e_remove_flags(*pl1e, _PAGE_RW); >> page = mfn_to_page(l1e_get_mfn(*pl1e)); >> >> + ASSERT(page->u.inuse.type_info & PGT_type_mask); >> + ASSERT((page->u.inuse.type_info & PGT_type_mask) <= >> PGT_root_page_table); > > This is an obfuscated > > ASSERT((page->u.inuse.type_info & PGT_type_mask) >= PGT_l1_page_table && > (page->u.inuse.type_info & PGT_type_mask) <= PGT_root_page_table); I can certainly switch to this yet longer piece of code, and ... > and > >> + ASSERT(!(page->u.inuse.type_info & ~PGT_type_mask)); > > this has no context. > > At a bare minimum, you need a comment stating what properties we're > looking for, so anyone suffering an assertion failure has some clue as > to what may have gone wrong. ... I can certainly transform the respective parts of the description into a code comment. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.