[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

```Hi, Atsushi

After my patches applied, I have tested similarly.
The CPU% shows following.
dom0  25
dom1  25
dom2  50
dom3 100

How do you think about my patches?

Regards,
Naoki Nishiguchi

Atsushi SAKAI wrote:
```
```Hi, George

Sorry for delaying.

With this type of changes,
The CPU% shows following.
dom1  26
dom2  26
dom3  51
dom4  96

Thanks
Atsushi SAKAI

"George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:

```
```OK, I've grueled through an example by hand and think I see what's going on.

So the idea of the credit scheduler is that we have a certain number
of "credits" per accounting period, and each of these credits
represents a certain amount of time.  The scheduler gives out credits
according to weight, so theoretically each accounting period, if all
vcpus are active, each should consume all of its credits.  Based on
that assumption, if a vcpu has run and accumulated more than one full
accounting period of credits, it's probably idle and we can leave it
be.

The problem in this situation isnt' so much with rounding errors, as
with *scheduling granularity*.  In the eample given:

d1: weight 128
d2: weight 128
d3: weight 256
d4: weight 512

If each domain has 2 vcpus, and there are 2 cores, then the credits
will be divided thus:

d1: 37 credits / vcpu
d2: 37 credits / vcpu
d3: 75 credits / vcpu
d4: 150 credits / vcpu

But since scheduling and accounting only happens every "tick", and
every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
consuming 37 credits, consumes 100; same with each vcpu of d3.   At
the end of the first accounting period, d{1,2,3} have gotten to run
for 100 credits worth of time, but d4 hasn't gotten to run at all.

In short, the fact that we have a 100-credit scheduling granularity
breaks the assumption that every VM has had a chance to run each
accounting period when there are really long runqueues.

I can think of a couple of solutions: the simplest one might be to
sort the runqueue by number of credits -- at least every accounting
period.  In that case, d4 would always get to run every accounting
period; d{1.2} might not run for a given accounting period, but the
next time it would have twice the number of credits, &c.

Others might include extending accounting periods when we have long
runqueues, or doing the credit limit during accounting only if it's
not on the runqueue (Sakai-san's idea) *combined* with a check when
the vcpu blocks.  That would catch vcpus that are only moderately
active, but just happen to be on the runqueue for several accounting
periods in a row.

Sakai-san, would you be willing to try to implement a simple "runqueue
sort" patch, and see if it also solves your scheduling issue?

-George

On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> wrote:
```
```Hi, Emmanuel

1)rounding error for credit

This patch is over rounding error.
So I think it does not need to consider this effect.
If you think, would you suggest me your patch.
It seems changing CSCHED_TICKS_PER_ACCT is not enough.

2)Effect for I/O intensive job.

I am not change the code for BOOST priority.
I just changes "credit reset" condition.
It should be no effect on I/O intensive(but I am not measured it.)
If it needs, I will test it.
Which test is best for this change?
(Simple I/O test is not enough for this case,
I think complex domain I/O configuration is needed to prove this patch effect.)

3)vcpu allocation measurement.

At first time, I use
http://weather.ou.edu/~apw/projects/stress/
stress --cpu xx --timeout xx --verbose
then I use simple test.(since 2vcpus on 1domain)
yes > /dev/null &
yes > /dev/null &
Now I test with suggested method, then result is
original   w/ patch
dom1    27        25
dom2    27        25
dom3    53        50
dom4    91        98

Thanks
Atsushi SAKAI

Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote:

```
```On Dec 9, 2008, at 2:25, George Dunlap wrote:
```
```On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
<sakaia@xxxxxxxxxxxxxx> wrote:
```
```You mean it should get rid of "credit reset"?
```
```Yes, that's exactly what I was thinking.  Removing the check for vcpus
on the runqueue may actually be functionally equivalent to removing
the check altogether.
```
```Essentially, this code is there as a safeguard against rounding errors
and other oddball cases. In theory, a runnable VCPU should seldom
accumulate more than one time slice's worth of credits.

The problem with your change is that a VCPU that is not a spinner
but instead runs and sleeps may not be removed from the accounting
list because when it should because it will not always be running when
accounting and the check in question is performed. Potentially this will
do very bad things for VCPUs that are I/O intensive or otherwise yield
or sleep for a short time before consuming a full time slice.

One thing that may help here is to make the credit calculations less
prone to rounding errors. One thing I had wanted to do while at
XenSource but never got around to was to change the arithmetic
so that instead of 30 credits representing a time slice, we would
make this a much bigger number.

In this case for example, you would get credit allocations that had
less significant rounding errors if you used 30000 instead of 30
credits per time slice:

dom1 vcpu0,1 w128 credit 3750
dom2 vcpu0,1 w128 credit 3750
dom3 vcpu0,1 w256 credit 7500
dom4 vcpu0,1 w512 credit 15000

I suspect this would get rid of a large number of cases such as the
one you are reporting, where a runnable VCPU's credit exceeds
one entire time slice. This type of change would improve accuracy
and not screw up credit computation for I/O intensive and other
non spinning domains.

What do you think?

Also please confirm that your VCPUs are indeed doing simple
"while(1);" loops.

Cheers,
Emmanuel.
```
```

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

------------------------------------------------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
```
```

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

```

 Lists.xenproject.org is hosted with RackSpace, monitoring our servers 24x7x365 and backed by RackSpace's Fanatical Support®.