[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ping: [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running



On 19.03.2021 13:14, Dario Faggioli wrote:
> If we schedule, and the current vCPU continues to run, its statistical
> load is not properly updated, resulting in something like this, even if
> all the 8 vCPUs are 100% busy:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9996885 [w=256] load=35 (~0%)
> (XEN)     2: [0.1] flags=2 cpu=2 credit=9993725 [w=256] load=796 (~0%)
> (XEN)     3: [0.2] flags=2 cpu=1 credit=9995885 [w=256] load=883 (~0%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9998833 [w=256] load=487 (~0%)
> (XEN)     5: [0.4] flags=2 cpu=6 credit=9998942 [w=256] load=1595 (~0%)
> (XEN)     6: [0.5] flags=2 cpu=0 credit=9994669 [w=256] load=22 (~0%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9997706 [w=256] load=0 (~0%)
> (XEN)     8: [0.7] flags=2 cpu=3 credit=9992440 [w=256] load=0 (~0%)
> 
> As we can see, the average load of the runqueue as a whole is, instead,
> computed properly.
> 
> This issue would, in theory, potentially affect Credit2 load balancing
> logic. In practice, however, the problem only manifests (at least with
> these characteristics) when there is only 1 runqueue active in the
> cpupool, which also means there is no need to do any load-balancing.
> 
> Hence its real impact is pretty much limited to wrong per-vCPU load
> percentages, when looking at the output of the 'r' debug-key.
> 
> With this patch, the load is updated and displayed correctly:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN) Domain info:
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9995584 [w=256] load=262144 (~100%)
> (XEN)     2: [0.1] flags=2 cpu=6 credit=9992992 [w=256] load=262144 (~100%)
> (XEN)     3: [0.2] flags=2 cpu=3 credit=9998918 [w=256] load=262118 (~99%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9996867 [w=256] load=262144 (~100%)
> (XEN)     5: [0.4] flags=2 cpu=1 credit=9998912 [w=256] load=262144 (~100%)
> (XEN)     6: [0.5] flags=2 cpu=2 credit=9997842 [w=256] load=262144 (~100%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9994623 [w=256] load=262144 (~100%)
> (XEN)     8: [0.7] flags=2 cpu=0 credit=9991815 [w=256] load=262144 (~100%)
> 
> Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx>
> ---
> Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
> Cc: Ian Jackson <iwj@xxxxxxxxxxxxxx>
> ---
> Despite the limited effect, it's a bug. So:
> - it should be backported;
> - I think it should be included in 4.15. The risk is pretty low, for
>   the same reasons already explained when describing its limited impact.

I'm a little puzzled to find this is still in my waiting-to-go-in
folder, for not having had an ack (or otherwise). George?

Jan

> ---
>  xen/common/sched/credit2.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index eb5e5a78c5..b3b5de94cf 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3646,6 +3646,8 @@ static void csched2_schedule(
>              runq_remove(snext);
>              __set_bit(__CSFLAG_scheduled, &snext->flags);
>          }
> +        else
> +            update_load(ops, rqd, snext, 0, now);
>  
>          /* Clear the idle mask if necessary */
>          if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
> 
> 
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.