[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] State of GPLPV tests - 28.11.11



> Hello James,
> 
> I am still running tests 7 days a week on two test systems. Results
are quite
> discouraging though. After experiencing crash after crash I wanted to
test if
> the configuration I called "stable" (Xen 4.0.1, GPLPV 0.11.0.213, dom0
kernel
> 2.6.32.18-pvops0-ak3) was stable indeed. But even that config crashed
when
> running my torture test. It is stable on our production systems -
running
> other workloads of course.

What crash are you getting these days? Is it the same one as you used to
get?

>  > One thing I thought of... virtualisation gives an interesting  >
opportunity to
> exaggerate race conditions. If you have 8 vCPU's in a  > DomU but only
let
> one or two physical CPUs service those 8 vCPU's,then  > it can give
rise to
> race conditions which could only be rarely seen  > (or never seen) in
normal
> operation. It's awful for performance but  > if you could try that and
see if it
> gives rise to crashes a bit  > more frequently it might help us track
down the
> problem.
> 
> What exactly is the config you are talking about in terms of Xen/dom0
> command line? In terms of domU config files?

I don't remember the exact syntax, but if you specify vcpus=4 but only
let the DomU run on one physical cpu it might trip up more often, if the
problem is caused by a race. If the problem is an arithmetic error in
xennet then it won't help.

> 
> As always, I monitor your mercurial repo ;-) How would you see the
> relationship of commits 952+953 to our problem? 952 seems to affect
LSO in
> some way since LsoV1TransmitComplete.TcpPayload is finally wrong
(could it
> be negative since tx_length is smaller than the fixed tx_length?).
What about
> 953?

Not sure.

> One more thought: As mentioned earlier crashes often occurred after an
> uptime of 9-10 days and these crashes occurred too consistently to be
a "by
> chance" event. In my torture tests I am NOT USING a Windows NTP
service (I
> use the meinberg NTP daemon on Windows). But on production I do. Can
> you see any possible impact here?
> 

It's certainly more likely for a stray UDP packet to cause an upset I
guess. As the packets pass through a Linux firewall (iptables in Dom0)
it's more likely that errant TCP packets will be dropped there.

Do you have a crash dump against 0.11.0.323?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.