[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] OOM problems



Daniel:

> Which branch/revision does latest pvops mean?

stable-2.6.32, using the latest pull as of today. (I also tried next-2.6.37, but it wouldn't boot for me.)
Would you be willing to try and reproduce that again with the XCP blktap
(userspace, not kernel) sources? Just to further isolate the problem.
Those see a lot of testing. I certainly can't come up with a single fix
to the aio layer, in ages. But I'm never sure about other stuff
potentially broken in userland.
I'll have to give it a try. Normal blktap still isn't working with 
pv_ops, though, so I hope this is a drop-in for blktap2.
In my last bit of troubleshooting, I took O_DIRECT out of the open call 
in tools/blktap2/drivers/block-aio.c, and preliminary testing indicates 
that this might have eliminated the problem with corruption. I'm testing 
further now, but could there be an issue with alignment (since the 
kernel is apparently very strict about it with direct I/O)? (Removing 
this flag also brings back in use of the page cache, of course.)
If dio is definitely not what you feel you need, let's get back your
original OOM problem. Did reducing dom0 vcpus help? 24 of them is quite
aggressive, to say the least.
When I switched to aio, I reduced the vcpus to 2 (I needed to do this 
with dom0_max_vcpus, rather than through xend-config.sxp -- the latter 
wouldn't always boot). I haven't separately tried cached I/O with 
reduced CPUs yet, except in the lab; and unfortunately I still can't get 
the problem to happen in the lab, no matter what I try.
If that alone doesn't help, I'd definitely try and check vm.dirty_ratio.
There must be a tradeoff which doesn't imply scribbling the better half
of 1.5GB main memory.
The default for dirty_ratio is 20. I tried halving that to 10, but it 
didn't help. I could try lower, but I like the thought of keeping this 
in user space, if possible, so I've been pursuing the blktap2 path most 
aggressively.
Ian:

 That's disturbing. It might be worth trying to drop the number of VCPUs in 
dom0 to 1 and then try to repro.
 BTW: for production use I'd currently be strongly inclined to use the XCP 
2.6.32 kernel.
Interesting, ok.

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.