[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle
> On Mar 19, 2020, at 12:11 AM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote: > > There have been report of stalls of guest vCPUs, when Credit2 was used. > It seemed like these vCPUs were not getting scheduled for very long > time, even under light load conditions (e.g., during dom0 boot). > > Investigations led to the discovery that --although rarely-- it can > happen that a vCPU manages to run for very long timeslices. In Credit2, > this means that, when runtime accounting happens, the vCPU will lose a > large quantity of credits. This in turn may lead to the vCPU having less > credits than the idle vCPUs (-2^30). At this point, the scheduler will > pick the idle vCPU, instead of the ready to run vCPU, for a few > "epochs", which often times is enough for the guest kernel to think the > vCPU is not responding and crashing. > > An example of this situation is shown here. In fact, we can see d0v1 > sitting in the runqueue while all the CPUs are idle, as it has > -1254238270 credits, which is smaller than -2^30 = −1073741824: > > (XEN) Runqueue 0: > (XEN) ncpus = 28 > (XEN) cpus = 0-27 > (XEN) max_weight = 256 > (XEN) pick_bias = 22 > (XEN) instload = 1 > (XEN) aveload = 293391 (~111%) > (XEN) idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff > (XEN) tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000 > (XEN) fully idle cores: > 00,00000000,00000000,00000000,00000000,00000000,0fffffff > [...] > (XEN) Runqueue 0: > (XEN) CPU[00] runq=0, sibling=00,..., core=00,... > (XEN) CPU[01] runq=0, sibling=00,..., core=00,... > [...] > (XEN) CPU[26] runq=0, sibling=00,..., core=00,... > (XEN) CPU[27] runq=0, sibling=00,..., core=00,... > (XEN) RUNQ: > (XEN) 0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 > (~100%) > > We certainly don't want, under any circumstance, this to happen. > Let's, therefore, define a minimum amount of credits a vCPU can have. > During accounting, we make sure that, for however long the vCPU has > run, it will never get to have less than such minimum amount of > credits. Then, we set the credits of the idle vCPU to an even > smaller value. > > NOTE: investigations have been done about _how_ it is possible for a > vCPU to execute for so much time that its credits becomes so low. While > still not completely clear, there are evidence that: > - it only happens very rarely, > - it appears to be both machine and workload specific, > - it does not look to be a Credit2 (e.g., as it happens when > running with Credit1 as well) issue, or a scheduler issue. > > This patch makes Credit2 more robust to events like this, whatever > the cause is, and should hence be backported (as far as possible). > > Reported-by: Glen <glenbarney@xxxxxxxxx> > Reported-by: Tomas Mozes <hydrapolic@xxxxxxxxx> > Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx> Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxx>
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |