Xen project Mailing List

Re: [PATCH] mm/pdx: Add comments throughout the codebase for pdx

To: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>

Date: Thu, 22 Jun 2023 11:15:17 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CMl68EtgETa3xL4v+g74W+ZbCX557QpXLqKZGx9XphY=; b=MhxRzEsc6Tb19NR6JqKTKpYcvwXWCLdNwoyO/mV2dYDQjVy/n37hsES9w1JASpTwcafa8Zhp4KSJcdkDag4IGx9wwg08ANlGaEn7G6pDO4AeA5ug2ZYEv6+WV0J149S8PKJEkaWhZy1XOEn5UJ7KpluFXY+fulOs4gaeCek/P7h8cFo8ZWWLqDl94leZ4eZdZDnoYXPteHJ35ZV84F4rVLMyEg5gTBgHbk9+fzlis+irhroUAu29xhCzoCYWiUxmlcrG1GBpKUtHDEdgBuNgGVWV2PoZ77p4mYUt1VqtjA8N2avcAsm4Zh8MKO7Lxj7FLR0c4uv7pvwnaddCs2rGiQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HmPQ7/JcanUPzVm2+BZNtSZYcCq2zrj+eYpy6hiGlPJcSHkDwztva4Z9IJW4ya66CNGNFQxE9zJIVC+ijuuJD2RmIV2CsTYPXirXvJvlA37KpdYHaGqMZAx+CKUzMzS/t3Rg2aWZet0y6YRVh5LCCMJ1t9W+q719QLPR7y42g+6AoxOrk8td+MOvMVUVGZJeE76h2f781XMBoJtXkxpTmrNsOpxrRRMSbB2KsGhIPDzrXwxXZhD+FhUiLkj8yMef/b94ddIGyaB17ytlpucGglmflB08t5zj23tAmQc9kE3ylYxAnIFE8ijftqYX4xnaxhlQD7CyamoB7Dmy8bJplQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 22 Jun 2023 09:15:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.06.2023 16:25, Alejandro Vallejo wrote: > On Mon, Jun 19, 2023 at 05:30:20PM +0200, Jan Beulich wrote: >>> @@ -57,9 +99,25 @@ uint64_t __init pdx_init_mask(uint64_t base_addr) >>> (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1); >>> } >>> >>> -u64 __init pdx_region_mask(u64 base, u64 len) >>> +uint64_t __init pdx_region_mask(uint64_t base, uint64_t len) >>> { >>> - return fill_mask(base ^ (base + len - 1)); >>> + uint64_t last = base + len - 1; >>> + /* >>> + * The only bit that matters in base^last is the MSB. There are 2 >>> cases. >>> + * >>> + * case msb(base) < msb(last): >>> + * then fill_mask(base^last) == fill_mask(last). This is non >>> + * compressible. >>> + * case msb(base) == msb(last): >>> + * This means that there _may_ be a sequence of compressible zeroes >>> + * for all addresses between `base` and `last` iff `base` has >>> enough >>> + * trailing zeroes. That is, it's compressible when >>> + * fill_mask(base^last) < fill_mask(last) >>> + * >>> + * The resulting mask is effectively the moving bits between `base` and >>> + * `last` >>> + */ >>> + return fill_mask(base ^ last); >>> } >> >> I don't see a need for you to actually change the code here. You can >> as well introduce "last" as shorthand just for the comment. > I thought as you did initially and wrote it as such. In the end it felt > wrong to have an explanation in terms of a token not present in the code. > Furthermore, explaining what the shorthand is in the comment takes more > space than introducing `last` in the code itself. > > ``` > uint64_t last = base + len - 1; > /* > * The only bit that matters in base^last is the MSB. There are 2 cases. > ``` > vs > ``` > /* > * Let `last = base + len -1` out of convenience. > * The only bit that matters in base^last is the MSB. There are 2 cases. > ``` > > TL;DR: I didn't factor out `last` due to aesthetics (I'd rather not touch > the code in this patch, in fact), but it seems warranted in order to reduce > the impedance mismatch between this big-ish comment and the call it > describes. I'll post v2 without that adjustment in case I managed to > convince you. Otherwise feel free to adjust it on commit. > >> What I dislike in your way of putting it is the use of fill_mask(last) when >> such a call never occurs in code. Starting from the first sentence, >> can't you explain things just in terms of said MSB > I see. I can refer to the MSBs instead. Works equally well. > > e.g: > fill_mask(base^last) == fill_mask(last) > | > V > msb(fill_mask(base^last)) == msb(last) > >> (where the two cases are "set" and "clear")? > I'm not sure I understand what you mean here. Hmm, yes, I think I was getting confused here. >>> --- a/xen/include/xen/mm.h >>> +++ b/xen/include/xen/mm.h >>> @@ -31,6 +31,22 @@ >>> * (i.e. all devices assigned to) a guest share a single DMA address >>> space >>> * and, by default, Xen will ensure dfn == pfn. >>> * >>> + * pdx: Page InDeX >>> + * Indices into the frame table holding the per-page's book-keeping >>> + * metadata. A compression scheme is used and there's a non-identity >>> + * mapping between valid(mfn) <-> valid(pdx) See the comments in pdx.c >>> + * for an in-depth explanation of that mapping. >> >> The mapping may very well be (and on x86 typically is) an identity >> one. IOW you want to describe not only the compression case, but also >> the "no compression possible" one. > Point taken. I'll rephrase it slightly as "possibly non-identity" and > explicitly state the "no compression is possible" case. > >> >> PDXes also aren't just indexes to the frame table, but also to the >> direct mapping. > I had something to that effect earlier on, but I removed it because it > doesn't seem to be the case on ARM. There's a directmap_base_pdx global > that states the first pdx to be mapped on the directmap. Which would merely make it a biased index. I very much hope they eliminate holes (and not just unused leading space) from the directmap as well. >>> + * ## PDX compression >>> + * >>> + * This is a technique to avoid wasting memory on machines known to have >>> + * split their machine address space in two big discontinuous and highly >>> + * disjoint chunks. >> >> Why two? There can be any number, and in fact on the system I originally >> had data from for reference (when first writing this code) iirc there >> were 8 nodes, each with a chunk of memory far away from the other chunks. >> The compression scheme used merely requires that some "inner" bits are >> unused (always zero) in all of those ranges. > Well, our implementation only supports two and I didn't see any obvious > hints about intending to increasing that number. Where are you taking that "supports two" from? When I first wrote this code, it was tested against a system with 8 (maybe it was 4, but certainly more than 2) discontiguous regions (not counting the hole below 4G). > I see where you're coming > from, though. I can make it more general so it's not outdated if the > pfn_to_pdx()/pdx_to_pfn() pair ever increases in scope to do several holes. > > Out of curiosity (and for posterity's sake), what was/is that system? I'm not sure I'm permitted to mention that. I'm pretty sure I carefully avoided mentioning the partner of ours back at the time. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.