[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] mm/pdx: Add comments throughout the codebase for pdx


  • To: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jul 2023 09:43:34 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PQt1i772a/8IewftbY+2LpHC5OCoBLspVSVdblgVqlE=; b=j+4vsBVME41J0zhPAGsj6/W66JsDvVvbxTL4LCz2+W6sRP1+uxgw9HDzYIQr0HPIzJqmcZ+e82PnF37lObueSj9t6gSvtIV4S7gu58YV0AUxgk1osxpz0V/NJzz1T0pKYHbNWDmuZOJQJIByXqYVcNSM0SVeq43wWtfVcWS5sbZAbQ8edE5yIpy3FikYil0EHugaMAd7WNRMskwkTFpnFk99JAwzXSL+6+BL9P9J1/LzUT93J+gt2RtBfLovYwOsNK8KTz3wxKLSqq4E5LmhqewKOtMBH76win+x/DJhkGLg2Wx1nAW2DaUkX7PaB51MUg7RAkHzzWlrVsaa8e0qbw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LMIucmI7C/V8wOWwfDf33NFBbrVN1O4RoMBeTLegEk8/mPpZeg/GOWeVc3DmHzIDwnFwtyZhbqivRCcxp0ZqNedCN+lj8/O2JuVLRGWnbV8g+tPQrKhpk5BrTPNPyD5ZuehmCQHitdmG4Wg+7BW8IPXESMyXIdoEguYiaeFOGhYOubL+8ZbBu2cqkBZllOtcF7+k58NG0XVZCeE3n5XG9INMhVgFmXLk0TrNfUgF4C/WS0rHj7A5eQdQRRbFFTsrr03ZoqHRecoMFNyTBhi6nMMMjqpO9E3YgiOLCklPY3A0bDiXW9YT2rkLYI6AEwMQD5/dKsJUMhqZXXkCgbdjmg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 10 Jul 2023 07:43:48 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.07.2023 17:55, Alejandro Vallejo wrote:
> On Thu, Jul 06, 2023 at 11:50:58AM +0200, Jan Beulich wrote:
>> On 22.06.2023 16:02, Alejandro Vallejo wrote:
>>> @@ -57,9 +100,25 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
>>>                           (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
>>>  }
>>>  
>>> -u64 __init pdx_region_mask(u64 base, u64 len)
>>> +uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
>>>  {
>>> -    return fill_mask(base ^ (base + len - 1));
>>> +    uint64_t last = base + len - 1;
>>> +    /*
>>> +     * The only bit that matters in base^last is the MSB. There are 2 
>>> cases.
>>> +     *
>>> +     * case msb(base) < msb(last):
>>> +     *     then msb(fill_mask(base^last)) == msb(last). This is non
>>> +     *     compressible.
>>> +     * case msb(base) == msb(last):
>>> +     *     This means that there _may_ be a sequence of compressible zeroes
>>> +     *     for all addresses between `base` and `last` iff `base` has 
>>> enough
>>> +     *     trailing zeroes. That is, it's compressible when
>>
>> Why trailing zeros? [100000f000,10ffffffff] has compressible bits
>> 32-35, but the low bits of base don't matter at all.

This is ...

>>> + * ## PDX compression
>>> + *
>>> + * This is a technique to avoid wasting memory on machines known to have
>>> + * split their machine address space in several big discontinuous and 
>>> highly
>>> + * disjoint chunks.
>>> + *
>>> + * In its uncompressed form the frame table must have book-keeping metadata
>>> + * structures for every page between [0, max_mfn) (whether they are backed
>>> + * by RAM or not), and a similar condition exists for the direct map. We
>>> + * know some systems, however, that have some sparsity in their address
>>> + * space, leading to a lot of wastage in the form of unused frame table
>>> + * entries.
>>> + *
>>> + * This is where compression becomes useful. The idea is to note that if
>>> + * you have several big chunks of memory sufficiently far apart you can
>>> + * ignore the middle part of the address because it will always contain
>>> + * zeroes as long as the base address is sufficiently well aligned and the
>>> + * length of the region is much smaller than the base address.
>>
>> As per above alignment of the base address doesn't really matter.
> Where above?

... what "above" here meant.

> As far as I understand you need enough alignment to cover the
> hole or you won't have zeroes to compress. Point in case:
> 
>   * region1: [0x0000000000000000 -
>               0x00000000FFFFFFFF]
> 
>   * region2: [0x0001FFFFFFFFF000 -
>               0x00020000FFFFFFFF]
> 
> I can agree this configuration is beyond dumb and statistically unlikely to
> exist in the wild, but it should (IMO) still be covered by that comment.

Right, but this isn't relevant here - in such a case no compression
can occur, yes, but not (just) because of missing alignment. See the
example I gave above (in the earlier reply) for where alignment
clearly doesn't matter for compression to be possible.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.