[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler



Hi, George

Sorry for delaying.

With this type of changes,
The CPU% shows following.
dom1  26
dom2  26
dom3  51
dom4  96

Thanks
Atsushi SAKAI

"George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:

> OK, I've grueled through an example by hand and think I see what's going on.
> 
> So the idea of the credit scheduler is that we have a certain number
> of "credits" per accounting period, and each of these credits
> represents a certain amount of time.  The scheduler gives out credits
> according to weight, so theoretically each accounting period, if all
> vcpus are active, each should consume all of its credits.  Based on
> that assumption, if a vcpu has run and accumulated more than one full
> accounting period of credits, it's probably idle and we can leave it
> be.
> 
> The problem in this situation isnt' so much with rounding errors, as
> with *scheduling granularity*.  In the eample given:
> 
> d1: weight 128
> d2: weight 128
> d3: weight 256
> d4: weight 512
> 
> If each domain has 2 vcpus, and there are 2 cores, then the credits
> will be divided thus:
> 
> d1: 37 credits / vcpu
> d2: 37 credits / vcpu
> d3: 75 credits / vcpu
> d4: 150 credits / vcpu
> 
> But since scheduling and accounting only happens every "tick", and
> every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
> consuming 37 credits, consumes 100; same with each vcpu of d3.   At
> the end of the first accounting period, d{1,2,3} have gotten to run
> for 100 credits worth of time, but d4 hasn't gotten to run at all.
> 
> In short, the fact that we have a 100-credit scheduling granularity
> breaks the assumption that every VM has had a chance to run each
> accounting period when there are really long runqueues.
> 
> I can think of a couple of solutions: the simplest one might be to
> sort the runqueue by number of credits -- at least every accounting
> period.  In that case, d4 would always get to run every accounting
> period; d{1.2} might not run for a given accounting period, but the
> next time it would have twice the number of credits, &c.
> 
> Others might include extending accounting periods when we have long
> runqueues, or doing the credit limit during accounting only if it's
> not on the runqueue (Sakai-san's idea) *combined* with a check when
> the vcpu blocks.  That would catch vcpus that are only moderately
> active, but just happen to be on the runqueue for several accounting
> periods in a row.
> 
> Sakai-san, would you be willing to try to implement a simple "runqueue
> sort" patch, and see if it also solves your scheduling issue?
> 
>  -George
> 
> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> wrote:
> > Hi, Emmanuel
> >
> > 1)rounding error for credit
> >
> > This patch is over rounding error.
> > So I think it does not need to consider this effect.
> > If you think, would you suggest me your patch.
> > It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >
> > 2)Effect for I/O intensive job.
> >
> > I am not change the code for BOOST priority.
> > I just changes "credit reset" condition.
> > It should be no effect on I/O intensive(but I am not measured it.)
> > If it needs, I will test it.
> > Which test is best for this change?
> > (Simple I/O test is not enough for this case,
> > I think complex domain I/O configuration is needed to prove this patch 
> > effect.)
> >
> > 3)vcpu allocation measurement.
> >
> > At first time, I use
> >  http://weather.ou.edu/~apw/projects/stress/
> >  stress --cpu xx --timeout xx --verbose
> > then I use simple test.(since 2vcpus on 1domain)
> >  yes > /dev/null &
> >  yes > /dev/null &
> > Now I test with suggested method, then result is
> >     original   w/ patch
> > dom1    27        25
> > dom2    27        25
> > dom3    53        50
> > dom4    91        98
> >
> >
> > Thanks
> > Atsushi SAKAI
> >
> >
> >
> >
> > Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote:
> >
> >> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >> > <sakaia@xxxxxxxxxxxxxx> wrote:
> >> >> You mean it should get rid of "credit reset"?
> >> >
> >> > Yes, that's exactly what I was thinking.  Removing the check for vcpus
> >> > on the runqueue may actually be functionally equivalent to removing
> >> > the check altogether.
> >>
> >> Essentially, this code is there as a safeguard against rounding errors
> >> and other oddball cases. In theory, a runnable VCPU should seldom
> >> accumulate more than one time slice's worth of credits.
> >>
> >> The problem with your change is that a VCPU that is not a spinner
> >> but instead runs and sleeps may not be removed from the accounting
> >> list because when it should because it will not always be running when
> >> accounting and the check in question is performed. Potentially this will
> >> do very bad things for VCPUs that are I/O intensive or otherwise yield
> >> or sleep for a short time before consuming a full time slice.
> >>
> >> One thing that may help here is to make the credit calculations less
> >> prone to rounding errors. One thing I had wanted to do while at
> >> XenSource but never got around to was to change the arithmetic
> >> so that instead of 30 credits representing a time slice, we would
> >> make this a much bigger number.
> >>
> >> In this case for example, you would get credit allocations that had
> >> less significant rounding errors if you used 30000 instead of 30
> >> credits per time slice:
> >>
> >> dom1 vcpu0,1 w128 credit 3750
> >> dom2 vcpu0,1 w128 credit 3750
> >> dom3 vcpu0,1 w256 credit 7500
> >> dom4 vcpu0,1 w512 credit 15000
> >>
> >> I suspect this would get rid of a large number of cases such as the
> >> one you are reporting, where a runnable VCPU's credit exceeds
> >> one entire time slice. This type of change would improve accuracy
> >> and not screw up credit computation for I/O intensive and other
> >> non spinning domains.
> >>
> >> What do you think?
> >>
> >> Also please confirm that your VCPUs are indeed doing simple
> >> "while(1);" loops.
> >>
> >> Cheers,
> >> Emmanuel.
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >

Attachment: runq_sort_for_accurate_weight.patch
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.