[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Dario Faggioli <dfaggioli@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 27 Jul 2022 08:19:44 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kfFdOSv5zxRvwJrqNP27myQOEUmuP3OpZqLriS7N3VI=; b=kvwFTP8m8XNPbkGe0riOFTXQMqBSIh5gU0upsOUnhJohU4dl4hZSwSJlpRNAxz3ZJpwwAm+7gpR+uqIw+UzWYBD+xJpzI4q5FLCA4wtEVpzXHjFbrffmqR+uyrlECHq8ODC1IxDdu9jrSrtmpIl3v5x4798hOkwjHSua4UvwstJpSzmcV+QF1zAR8SSrFEwtpYiy6SJaiZsvN3WlcJCjOHJuIEzZSduZWGt2aCYgnnStxuvfDJ3KWyqGKRjNjdwo0eZodYRqlDlHjoZ774FwelWyjrT6qCdBCZxCLHcRmRbTtfaizLklGS/0Oa+ADjE0fTUXK/dQza0vaHf/TwThzw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jzc2ua35zDz76mr9gG6Q7Ac/6joC0x3KFw1DCoqEeBsd6HkKEeZSFKmGmjQYbI1JmxtZNDzU+9ZjWnyU7IuAIRUDiWGeYfcxkUrHOr5r1Qv1IqDDKJRdoNhT2BdvV9r5MzJkJl34NjHlvW9W+5qfrfm0JbMIp/h5JGZEeYoYY1wYq0eVJG0x4hzYfi+5XOqwJZoVwECMNwgoYaa4uoCcXbJOjcWSjBH/uXJOPSna99WnUSEKtfeJEoQbLiQvDC7zUxNOU2FhTovIU8tBcZXB0PNNPMUMX3yncBCVAp2WlZlKXOEN7CUOf7IHtz9FMxeFSb0w9eXFv8B93hF1y+TVhg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 27 Jul 2022 06:19:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.04.2022 16:09, Dario Faggioli wrote:
> On Fri, 2022-04-29 at 14:16 +0200, Jan Beulich wrote:
>> On 29.04.2022 12:52, Dario Faggioli wrote:
>>>> Is that mainly
>>>> to have a way to record preferences even when all preferred CPUs
>>>> are
>>>> offline, to be able to go back to the preferences once CPUs come
>>>> back
>>>> online?
>>>>
>>> That's another example/use case, yes. We want to record the user's
>>> preference, whatever the status of the system (and of other aspects
>>> of
>>> the configuration) is.
>>>
>>> But I'm not really sure I've answered... Have I?
>>
>> You did. 
>>
> Ok, great! :-)
> 
>>>
>>> If dom0_nodes is in "strict" mode,
>>> we want to control hard affinity only. So we set soft to the
>>> default,
>>> which is "all". During operations, since hard is a subset of "all",
>>> soft-affinity will be just ignored.
>>
>> Right - until such point that all (original) Dom0 CPUs have gone
>> offline. Hence my 2nd question.
>>
>>> So I'm using "all" because soft-affinity is just "all", unless
>>> someone
>>> sets it differently.
>>
>> How would "someone set it differently"? Aiui you can't control both
>> affinities at the same time.
>>
> Yeah, the argument here is basically the one that I put below, and that
> you say you understand. I guess I could have put it a bit more upfront,
> sorry about that. :-)
> 
>>>>
>>>> Hmm, you leave this alone. Wouldn't it be better to further
>>>> generalize
>>>> things, in case domain affinity was set already? I was referring
>>>> to
>>>> the mask calculated by sched_select_initial_cpu() also in this
>>>> regard.
>>>> And when I did suggest to re-use the result, I did mean this
>>>> literally.
>>>>
>>> Technically, I think we can do that. Although, it's probably
>>> cumbersome
>>> to do, without adding at least one cpumask on the stack, or
>>> reshuffle
>>> the locking between sched_select_initial_cpu() and
>>> sched_init_vcpu(),
>>> in a way that I (personally) don't find particularly pretty.
>>
>> Locking? sched_select_initial_cpu() calculates into a per-CPU
>> variable,
>> which I sincerely hope cannot be corrupted by another CPU.
>>
> No, not by another CPU, hopefully.
> 
> And this is probably fine, during boot, when there should be no (other)
> scheduling activity. However, during normal operation, a vCPU being
> scheduled on CPU X, or in general having X in v->processor, could be
> using the scratch cpumask of X already. So, if we use it without
> locking, we'd risk using the wrong mask.
> 
> Therefore, we require the scheduler lock to be held, for playing with
> the scratch cpumasks:
> 
> /*
>  * Scratch space, for avoiding having too many cpumask_t on the stack.
>  * Within each scheduler, when using the scratch mask of one pCPU:
>  * - the pCPU must belong to the scheduler,
>  * - the caller must own the per-pCPU scheduler lock (a.k.a. runqueue
>  *   lock).
>  */
> DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
> #define cpumask_scratch        (&this_cpu(cpumask_scratch))
> #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
> 
> And sched_init_vcpu() (and hence sched_select_initial_cpu()) can be
> called during normal operation.
> 
> In fact, sched_select_initial_cpu() does pcpu_schedule_lock_irqsave()
> before starting using it.
> 
>>
>>> And again, soft and hard affinity should be set to what the user
>>> wants
>>> and asks for. And if, for instance, he/she passes
>>> dom0_nodes="1,strict", soft-affinity should just be all. If, e.g.,
>>> we
>>> set both hard and soft affinity to the CPUs of node 1, and if later
>>> hard affinity is manually changed to "all", soft affinity will
>>> remain
>>> to node 1, even if it was never asked for it to be that way, and
>>> the
>>> user will need to change that explicitly as well. (Of course, it's
>>> not
>>> particularly clever to boot with dom0_nodes="1,strict" and then
>>> change
>>> dom0's vCPUs' hard affinity to node 0... but the user is free to do
>>> that.)
>>
>> I can certainly accept this as justification for using "all" further
>> up.
>>
> Good then.

I notice this issue is still open. May I ask what the plans are here?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.