[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen: credit2: credit2 can’t reach the throughput as expected



Forwarding to xen-devel, as it was dropped, and I did not notice.
---
On Thu, 2019-02-14 at 07:10 +0000, zheng chuan wrote: 
> Hi, Dario,
>  
Hi,

> I have put the test demo in attachment, please run it as follows:
> 1. compile it
>   gcc upress.c -o upress
> 2. calculate the loops in dom0 first
>   ./upress -l 100
>   For example, the output is
>   cpu khz : 2200000
>   calculate loops: 4472.
>   We get 4472.
> 3. give the 20% pressure for each vcpu in guest by
>   ./upress -l 20 -z 4472 &
>   It is better to bind each pressure task to vcpu by taskset.
>  
Ok, thanks for the code and the instructions, I will give it a try.

> Sorry for the mess picture, you can see the figure below.
> 
Yeah, that's clearer. However, it is preferred to avoid HTML emails. In
these cases, you could put the table in some online accessible
document, and post the link. :-)

Also, let me ask this again, is this coming from actual tracing (like
with `xentrace` etc)?

> The green one means vcpu is running while the red one means idle.
> In Fig.1, vcpu1 and vcpu2 runs staggeredly, it means vcpu1 runs 20ms
> and then vcpu2 runs 20ms while vcpu1 is sleeping.
> 
How do you know it's sleeping and not, for instance, that it has been
preempted and hence is waiting to run?

My point being that, when you setup a workload like this, and only look
at the throughput you achieve, it is expected that schedulers with
longer timeslices do better.

It would be interesting to look at both throughput and latency, though.
In fact, (assuming the analysis is correct) in the Credit1 case, if two
vcpus wakes up at about the same time, the one that wins the pcpu runs
for a full timeslice, or until it blocks, i.e., in this case, for 20ms.
This means the other vcpu will have to wait for so long, before being
able to do anything.

> In Fig.2, vcpu1 and vcpu2 runs at the same time, it means vcpu1 and
> vcpu2 compete for pCPU, and then go to sleep at the same time.
> Obviously, the smaller time-slice is, the worse competition happens.
> 
But the better the latency. :-D

What I mean is that, achieving best throughput or best latency at the
same time is often impossible, and the job of a scheduler is to come up
with a trade-off, as well as with tunables for letting people that
cares more about either one or the other, to steer it that direction.

Achieving better latency than Credit1 has been a goal of Credit2, since
design time. However, it's possible that we ended up sacrificing
throughput too much, or that we lack tunables to let users decide what
they want.

Of course, this is all assuming that the analysis of the problem that
you're providing is correct, which I'll be looking into confirming. :-)

> As you mentioned that the Credit2 does not have a real timeslice, the
> vcpu can be preempted by the difference of credit (+
> sched_ratelimit_us) dynamically.
> 
Actually, it's:

 difference_of_credit + min(CSCHED2_MIN_TIMER, sched_ratelimit_us)

> > Perhaps, one thing that can be done to try to confirm this
> analysis, would be to
> > make the scheduling less frequent in Credit2 and, on the other
> hand, to make
> > it more frequent in Credit1.
> Here is the further test result:
> i.  it is interesting that it still works well if I make Credit1 to
> 1ms by xl sched-credit -s -t 1
> linux-sodv:~ # xl sched-credit
> Cpupool Pool-0: tslice=1ms ratelimit=1000us migration-delay=0us
> Name                                ID Weight  Cap
> Domain-0                             0    256    0
> Xenstore                             1    256    0
> guest_1                              2    256    0
> guest_2                              3    256    0
>  
Hah, yes, it is interesting indeed! It shows us one more time how not
predictable Credit1 behavior is, because of all the hacks it
accumulated over time (some of which, are my doing, I know... :-P).

> ii.  it works well if sched_ratelimit_us is set up to 30ms above.
> linux-sodv:~ # xl sched-credit2 -s -p Pool-0
> Cpupool Pool-0: ratelimit=30000us
>  
Ok, good to know, thanks for doing the experiment.

If you have time, can you try other values? I mean, while still on
Credit2, try to set ratelimiting to, like, 20, 15, 10, 5, and report
what happens?

> However, the sched_ratelimit_us is not so elegant and flexiable that
> it guarantees the specific time-slice fixedly.
> 
Well, I personally never loved it, but it is not completely unrelated
to what we're seeing and discussing, TBH. It indeed was introduced to
improve the throughput, in workloads where there was too many wakeups
(which, in Credit1, also resulted in invoking the scheduler and often
in context switching, do to boosting).

> It may very likely cause degrading of the other scheduler criteria
> like sched_latency.
> As far as I know, CFS could adjust time-slice according to the
> nr_queue in runqueue (in__sched_period() ).
> Could it possible that Credit2 also have the similar ability to
> adjust time-slice automatically?
>  
Well, let's see. Credit2 and CFS are very similar, in principle, but
the code is actually quite different. But yeah, we may be able to come
up with something more clever than just plain ratelimiting, for
adjusting what CFS calls "the granularity".

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.