[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86: make "dom0_nodes=" work with credit2
On 08.04.2022 13:20, Dario Faggioli wrote: > On Thu, 2022-04-07 at 15:27 +0200, Jan Beulich wrote: >> --- >> The Fixes: tag isn't very precise - it's rather the commit exposing >> the >> issue by default. I haven't been able to identify the actual commit >> which did introduce the problem; it may well be that it has always >> been >> there since the introduction of credit2. >> >> Credit2 moving the vCPU-s off of their initially assigned ones right >> away of course renders sched_select_initial_cpu()'s use of >> cpu_cycle() >> pretty useless. >> > Mmm... you mean that Credit2 is moving the vCPUs off they're assigned > ones _right_now_, or that it will, with this patch? Right now. On a 4-node (6 cores each) system with "dom0_nodes=0" I've observed the 6 vCPU-s to reliably be put on CPUs 13, 14, etc. The patch is actually undoing that questionable movement. Since I have a large amount of other patches in place (none of which I would assume to have such an effect) - Olaf has meanwhile confirmed that the change helps for him as well. > If you mean the former, I'm not sure it is. In fact, when > sched_select_initial_cpu() is called for dom0, dom0's node affinity is > just all nodes, isn't it? So, the we can pick any CPU in the cpupool, > and we use cycle to try to spread the vCPUs kind of evenly. The CPU mask used in the function is 0x3f for the example given above. I did not check which of the constituent parts of the calculation have this effect. But the result is that all CPUs will be put on CPU 0 first, as cpumask_cycle(..., 13) for a mask of 0x3f produces 0. > If you mean after this patch, well, sure, but that's normal. Again, > when we pick the initial CPU, we still don't know that the vCPUs have > an affinity. Or, actually, we know that in sched_setup_dom0_vcpus(), > but there's no direct way to tell it to sched_init_vcpu() (and hence > sched_select_initial_cpu()). > > That's because, by default, affinity is just "all cpus", when we create > the vCPUs, and we change that later, if they have one already (due to > it being present in the config file, or in the dom0_nodes parameter). But that's what I'm talking about a little further down, where you reply that you don't think using the more narrow set would hide the issue. > Something that *maybe* we can try, since we're handling dom0 vCPUs > specially anyway, is to directly set dom0's node affinity to the nodes > of the CPUs we have in dom0_cpus, before calling vcpu_create() (in > sched_setup_dom0_vcpus(), of course). > > This should make sched_select_initial_cpu() pick one of the "correct" > CPUs in the first place. But I don't know if it's worth, neither if > we'll still need this patch anyway (I have to check more thoroughly). As per above - sched_select_initial_cpu() behaves as I would expect it. It's credit2 which subsequently overrides that decision. >> But I guess that's still useful for other schedulers. >> I wonder though whether sched_init_vcpu() shouldn't use the CPU map >> calculated by sched_select_initial_cpu() for its call to >> sched_set_affinity() in the non-pinned case, rather than setting >> "all". >> > If we do that, and there's no affinity configured for the guest, or no > "dom0_nodes=", when will we reset the affinity to all, which what it > should be in such a case? Why "reset"? When no restrictions are in place, afaict sched_select_initial_cpu() will calculate a mask of "all". > Also, if I'm right in my reasoning above, when we come from > sched_setup_dom0_vcpus(), the mast calculated by > sched_select_initial_cpu() is basically cpupool0's cpus_valid, so this > wouldn't really change anything for the problem we're trying to solve > here. > >> (I guess doing so might mask the issue at hand, but I think the >> change >> here would still be applicable at least from an abstract pov.) >> > I don't think it would mask it, but I do think that, yes, the change > you're making would still be applicable. > > And, about it, One thing... > >> --- a/xen/common/sched/core.c >> +++ b/xen/common/sched/core.c >> @@ -3403,9 +3403,15 @@ void __init sched_setup_dom0_vcpus(struc >> { >> for_each_sched_unit ( d, unit ) >> { >> + spinlock_t *lock = unit_schedule_lock_irq(unit); >> + >> if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed ) >> sched_set_affinity(unit, &dom0_cpus, NULL); >> sched_set_affinity(unit, NULL, &dom0_cpus); >> + >> + sched_unit_migrate_start(unit); >> + unit_schedule_unlock_irq(lock, unit); >> + sched_unit_migrate_finish(unit); >> } >> } >> > ... The result of this (also considering the call to > domain_update_node_affinity()) ends up looking very similar to what > we'd get if, instead of calling sched_set_affinity(), we call > vcpu_set_affinity(). Funny you should say this - this is what I had done first. It works, but it is less efficient than the approach above. First and foremost when there are multiple vCPU-s per unit. > I'm therefore wondering if we should try to just do that... But I'm not > sure, mostly because that would mean calling > domain_update_node_affinity() for all dom0's vCPUs, which is clearly > less efficient than just calling it once at the end. Indeed, that's the other reason why I did change to the approach above. > So I'm thinking that we can indeed do it like this, and add a comment. A comment to what effect? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |