[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)



For the record, I am seeing the same problem (first one,
haven't yet got multiple runs) with vcpus=1 for all domains.
Only on 32-bit this time and only 20%, but those may
be random scheduling factors.  This is also with
tap:aio instead of file so as to eliminate dom0 page
cacheing effects.

 394s dom0
2265s 64-bit #1
2275s 64-bit #2
2912s 32-bit #1
2247s 32-bit #2 <-- 20% less!

I'm going to try a dom0_vcpus=1 run next.

> -----Original Message-----
> From: Dan Magenheimer
> Sent: Monday, April 05, 2010 2:18 PM
> To: George Dunlap
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)
> 
> Thanks for the reply!
> 
> Well I'm now seeing something a little more alarming:  Running
> an identical but CPU-overcommitted workload (just normal PV domains,
> no tmem or ballooning or anything), what would you expect the
> variance to be between successive identical measured runs
> on identical hardware?
> 
> I am seeing total runtimes, both measured by elapsed time and by
> sum-of-CPUsec across all domains (incl dom0), vary by 6-7% or more.
> This seems a bit unusual/excessive to me and makes it very hard
> to measure improvements (e.g. by tmem, for an upcoming Xen summit
> presentation) or benchmark anything complex.
> 
> > Is it possible that Linux is just favoring one vcpu over the other
> for
> > some reason?  Did you try running the same test but with only one VM?
> 
> Well "make -j8" will likely be single-threaded part of the time,
> but I wouldn't expect that to make that big a difference between
> two identical workloads.
> 
> I'm not sure I understand how I would run the same test with
> only one VM when the observation of the strangeness requires
> two VMs (and even then must be observed at random points during
> execution).
> 
> > Another theory would be that most interrupts are delivered to vcpu 0,
> > so it may end up in "boost" priority more often.
> 
> Hmmm... I'm not sure I get that, but what about _physical_ cpu 0
> for Xen?  If all physical cpu's are not the same and one VM
> has an affinity for vcpu0-on-pcpu0 and the other has an affinity
> for vcpu1-in-pcpu0, would that make a difference?
> 
> But still, 40% seems very large and almost certainly a bug,
> especially given the new observations above.
> 
> > -----Original Message-----
> > From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx]
> > Sent: Monday, April 05, 2010 8:44 AM
> > To: Dan Magenheimer
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)
> >
> > Is it possible that Linux is just favoring one vcpu over the other
> for
> > some reason?  Did you try running the same test but with only one VM?
> >
> > Another theory would be that most interrupts are delivered to vcpu 0,
> > so it may end up in "boost" priority more often.
> >
> > I'll re-post the credit2 series shortly; Keir said he'd accept it
> > post-4.0.  You could try it with that and see what the performance is
> > like.
> >
> >  -George
> >
> > On Fri, Apr 2, 2010 at 5:48 PM, Dan Magenheimer
> > <dan.magenheimer@xxxxxxxxxx> wrote:
> > > I've been running some heavy testing on a recent Xen 4.0
> > > snapshot and seeing a strange scheduling anomaly that
> > > I thought I should report.  I don't know if this is
> > > a regression... I suspect not.
> > >
> > > System is a Core 2 Duo (Conroe).  Load is four 2-VCPU
> > > EL5u4 guests, two of which are 64-bit and two of which
> > > are 32-bit.  Otherwise they are identical.  All four
> > > are running a sequence of three Linux compiles with
> > > (make -j8 clean; make -j8).  All are started approximately
> > > concurrently: I synchronize the start of the test after
> > > all domains are launched with an external NFS semaphore
> > > file that is checked every 30 seconds.
> > >
> > > What I am seeing is a rather large discrepancy in the
> > > amount of time consumed "underway" by the four domains
> > > as reported by xentop and xm list.  I have seen this
> > > repeatedly, but the numbers in front of me right now are:
> > >
> > > 1191s dom0
> > > 3182s 64-bit #1
> > > 2577s 64-bit #2 <-- 20% less!
> > > 4316s 32-bit #1
> > > 2667s 32-bit #2 <-- 40% less!
> > >
> > > Again these are identical workloads and the pairs
> > > are identical released kernels running from identical
> > > "file"-based virtual block devices containing released
> > > distros.  Much of my testing had been with tmem and
> > > self-ballooning so I had blamed them for awhile,
> > > but I have reproduced it multiple times with both
> > > of those turned off.
> > >
> > > At start and after each kernel compile, I record
> > > a timestamp, so I know the same work is being done.
> > > Eventually the workload finishes on each domain and
> > > intentionally crashes the kernel so measurement is
> > > stopped.  At the conclusion, the 64-bit pair have
> > > very similar total CPU sec and the 32-bit pair have
> > > very similar total CPU sec so eventually (presumably
> > > when the #1's are done hogging CPU), the "slower"
> > > domains do finish the same amount of work.  As a
> > > result, it is hard to tell from just the final
> > > results that the four domains are getting scheduled
> > > at very different rates.
> > >
> > > Does this seem like a scheduler problem, or are there
> > > other explanations? Anybody care to try to reproduce it?
> > > Unfortunately, I have to use the machine now for other
> > > work.
> > >
> > > P.S. According to xentop, there is almost no network
> > > activity, so it is all CPU and VBD.  And the ratio
> > > of VBD activity looks to be approximately the same
> > > ratio as CPU(sec).
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> > >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.