[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 2/6] xen/x86: move generically usable NUMA code from x86 to common


  • To: Wei Chen <Wei.Chen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 29 Sep 2022 14:14:20 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gDU6tXQT2buWFt9FYQoSuPAwWiJbHddAM8qHfvetIcg=; b=oewFvrelCFVyPVHDNQ9Gs8O+aC900vBLOdP1sk0FAPZ8y7uyl0R/2kXMIcXW/0703Qcc1ecc3dmYeoeSCM6j3b4wNKKrzZqB+cCNfPrGAsDSKTAabiIYmV1Ae7NUgPHmS+UCz2RyrCVWv4NYvpWgQDz+YhmhByZTec3+Fkf0M68vJuVMQvfocpUmvv7RiBuC9dWRW/I0k8J2kM3m9gwsx6twz61Ga3QEjLGURx30vJ0oJIoZAlgHBCV2l/RYJLLLSREiE2EU0n8Bh3FQ8ZYa5SVVzhLIPm5araWHjK2Ot3o2sfg7ioWfFxLBKIazf5Y4jcMKTEiHhTbqgU+Gc8FVvg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EMOfPDvj4j/8z0LjBPkgBj6Xm5VfC46Rw3IDDmYOg1M+Dzdt5BpOURS7yBE4NVd/FCKEbsCFAY4v0SE0+hbZzHFGU1OhxKg76B0zOiyfpCqKyhN5cLtrNh1ijKoGei5Uv54c70i8N/E2JMPn7k0xRhzhF4fq9buENGK+ItWlJTcwrYtP0DldsaMEL3U5okl98BW8hnMqp1rLLJ7CXuZsmOYfU9xtWslhgdsMVpVR6EEVLaVqpxI57v7HM08jTNuNtPEfQAFn4TPBCr+O8NSqeCfz/2V22cReqkEP07QWKoJHkMNFCPIcBTM+tRiEjhB/0hZ1hEXNFPuQi16nJ3R9Wg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: nd@xxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 29 Sep 2022 12:14:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.09.2022 09:43, Wei Chen wrote:
> On 2022/9/27 16:19, Jan Beulich wrote:
>> On 20.09.2022 11:12, Wei Chen wrote:
>>> +        nodes_used++;
>>> +        if ( epdx > memtop )
>>> +            memtop = epdx;
>>> +    }
>>> +
>>> +    if ( nodes_used <= 1 )
>>> +        i = BITS_PER_LONG - 1;
>>
>> Is this actually going to be correct for all architectures? Aiui
>> Arm64 has only up to 48 physical address bits, but what about an
>> architecture allowing the use of all 64 bits? I think at the very
>> least we want BUILD_BUG_ON(PADDR_BITS >= BITS_PER_LONG) here.
>>
> 
> Ok I will add above BUILD_BUG_ON. And I also have question why can't
> we use PADDR_BITS here directly?

Well, if you used PADDR_BITS, then you would use it without subtracting
1, and you'd be in trouble again when PADDR_BITS == BITS_PER_LONG. What
may be possible to do instead of BUILD_BUG_ON() is

    if ( nodes_used <= 1 )
        i = min(PADDR_BITS, BITS_PER_LONG - 1);

>>> +    else
>>> +        i = find_first_bit(&bitfield, sizeof(unsigned long) * 8);
>>> +
>>> +    memnodemapsize = (memtop >> i) + 1;
>>
>> Again perhaps the subject of a separate patch: Isn't there an off-by-1
>> mistake here? memtop is the maximum of all epdx-es, which are
>> calculated to be the first PDX following the region. Hence I'd expect
>>
>>      memnodemapsize = ((memtop - 1) >> i) + 1;
>>
>> here. I guess I'll make patches for both issues, which you may then
>> need to re-base over.
>>
> 
> Thanks, I will wait your patches.

Already sent out yesterday.

>>> +static void cf_check dump_numa(unsigned char key)
>>> +{
>>> +    s_time_t now = NOW();
>>> +    unsigned int i, j, n;
>>> +    struct domain *d;
>>> +    const struct page_info *page;
>>> +    unsigned int page_num_node[MAX_NUMNODES];
>>> +    const struct vnuma_info *vnuma;
>>> +
>>> +    printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n", key,
>>> +           now);
>>> +
>>> +    for_each_online_node ( i )
>>> +    {
>>> +        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>> +
>>> +        printk("NODE%u start->%lu size->%lu free->%lu\n",
>>> +               i, node_start_pfn(i), node_spanned_pages(i),
>>> +               avail_node_heap_pages(i));
>>> +        /* Sanity check phys_to_nid() */
>>> +        if ( phys_to_nid(pa) != i )
>>> +            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>> +                   pa, phys_to_nid(pa), i);
>>> +    }
>>> +
>>> +    j = cpumask_first(&cpu_online_map);
>>> +    n = 0;
>>> +    for_each_online_cpu ( i )
>>> +    {
>>> +        if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
>>> +        {
>>> +            if ( n > 1 )
>>> +                printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, 
>>> cpu_to_node[j]);
>>> +            else
>>> +                printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
>>> +            j = i;
>>> +            n = 1;
>>> +        }
>>> +        else
>>> +            ++n;
>>> +    }
>>> +    if ( n > 1 )
>>> +        printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
>>> +    else
>>> +        printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
>>> +
>>> +    rcu_read_lock(&domlist_read_lock);
>>> +
>>> +    printk("Memory location of each domain:\n");
>>> +    for_each_domain ( d )
>>> +    {
>>> +        process_pending_softirqs();
>>> +
>>> +        printk("Domain %u (total: %u):\n", d->domain_id, 
>>> domain_tot_pages(d));
>>> +
>>> +        for_each_online_node ( i )
>>> +            page_num_node[i] = 0;
>>
>> I'd be inclined to suggest to use memset() here, but I won't insist
>> on you doing this "on the fly". Along with this would likely go the
>> request to limit the scope of page_num_node[] (and then perhaps also
>> vnuma and page).
>>
> 
> memset for page_num_node makes sense, I will do it before 
> for_each_domain ( d ).

That won't be right - array elements need clearing on every iteration.
Plus ...

> About limit the scope, did you mean, we should move:
> 
> "const struct page_info *page;
> unsigned int page_num_node[MAX_NUMNODES];
> const struct vnuma_info *vnuma;"
> 
> to the block of for_each_domain ( d )?

... this limiting of scope (yes to your question) would also conflict
with the movement you suggest. It is actually (among other things)
such a mistaken movement which the more narrow scope is intended to
prevent.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.