[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle
> On Mar 12, 2020, at 1:44 PM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote: > > There have been report of stalls of guest vCPUs, when Credit2 was used. > It seemed like these vCPUs were not getting scheduled for very long > time, even under light load conditions (e.g., during dom0 boot). > > Investigations led to the discovery that --although rarely-- it can > happen that a vCPU manages to run for very long timeslices. In Credit2, > this means that, when runtime accounting happens, the vCPU will lose a > large quantity of credits. This in turn may lead to the vCPU having less > credits than the idle vCPUs (-2^30). At this point, the scheduler will > pick the idle vCPU, instead of the ready to run vCPU, for a few > "epochs", which often times is enough for the guest kernel to think the > vCPU is not responding and crashing. > > An example of this situation is shown here. In fact, we can see d0v1 > sitting in the runqueue while all the CPUs are idle, as it has > -1254238270 credits, which is smaller than -2^30 = −1073741824: > > (XEN) Runqueue 0: > (XEN) ncpus = 28 > (XEN) cpus = 0-27 > (XEN) max_weight = 256 > (XEN) pick_bias = 22 > (XEN) instload = 1 > (XEN) aveload = 293391 (~111%) > (XEN) idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff > (XEN) tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000 > (XEN) fully idle cores: > 00,00000000,00000000,00000000,00000000,00000000,0fffffff > [...] > (XEN) Runqueue 0: > (XEN) CPU[00] runq=0, sibling=00,..., core=00,... > (XEN) CPU[01] runq=0, sibling=00,..., core=00,... > [...] > (XEN) CPU[26] runq=0, sibling=00,..., core=00,... > (XEN) CPU[27] runq=0, sibling=00,..., core=00,... > (XEN) RUNQ: > (XEN) 0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 > (~100%) > > We certainly don't want, under any circumstance, this to happen. > Therefore, let's use INT_MIN for the credits of the idle vCPU, in > Credit2, to be sure that no vCPU can get below that value. > > NOTE: investigations have been done about _how_ it is possible for a > vCPU to execute for so long that its credits becomes so low. While still > not completely clear, there are evidence that: > - it only happens very rarely > - it appears to be both machine and workload specific > - it does not look to be a Credit2 (e.g., as it happens when running > with Credit1 as well) issue, or a scheduler issue > > This patch makes Credit2 more robust to events like this, whatever > the cause is, and should hence be backported (as far as possible). > > Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx> > Reported-by: Glen <glenbarney@xxxxxxxxx> > Reported-by: Tomas Mozes <hydrapolic@xxxxxxxxx> Nit: The reported-by’s should be before the SoB (i.e., tags roughly in time order). I think this is a good change to make the algorithm more robust, so: Acked-by: George Dunlap <george.dunlap@xxxxxxxxxx> But it seems like allowing a guest to rack up -2^63 credits is still a bad thing, and it would be nice to have some other backstop / reset mechanism. But I guess to have an effective mechanism of that sort we’d want to understand how it happened in the first place. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |