[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Questions about pass through block devices

To: xen-users@xxxxxxxxxxxxxxxxxxx
From: Tom Mornini <tmornini@xxxxxxxxxxxxxx>
Date: Sat, 10 Mar 2007 14:15:34 -0500
Delivery-date: Sat, 10 Mar 2007 11:14:48 -0800
List-id: Xen user discussion <xen-users.lists.xensource.com>

What happens to the block layer if the block request queue gets very,very long?

Imagine there are many DomUs and they are all working against VERYSLOW disks.

I wish I understood the entire chain better, and I'm sorry to asksuch a vague question.

I understand this is a weird question, because the performanceimplications of this scenario are horrific. The reason I ask is thatI have an intuitive gut feeling that the people who have experiencedDom0 crashes due to heavy disk I/O (which includes me!) on Xen3.0.2-3.0.4 may be hitting a corner case that most users would neverexperience.

In our case, we have Coraid AoE SAN storage that is accessed viadrivers in the Dom0s of our cluster, and CLVM LVs are passed throughinto the DomUs.

Early on, we made a terrible disk-layout decision that left us withvery poor disk I/O performance. That poor disk I/O performance wasexacerbated by the fact that nearly every DomU has two LVs attached,and one is GFS, which adds another layer of performance degradationdue to DLM locking.

We used to crash a lot. Now that we've re-organized the disks, we'recrashing far less often. The change was made to improve performance,and we never in a million years imagined it might be related to thiscrashing behavior.

It suddenly occurred to me that perhaps we're crashing less oftenbecause the block request delivery times have improved dramatically,and the queues are likely growing shorter.

The only direct evidence that I and one other list member have seenthat could be related to this issue is that accessing /dev/slabinfoon a regular basis in the Dom0 appears to greatly increase thefrequency of crashes, which leads me to believe that the issue, atits core, is somehow related to SLAB corruption in Dom0.

Another list member has also recently reported that this issueappears to be fixed in the unstable line. Have changes been made inunstable that would shed some light on any of this?


Thanks, and sorry for the novelette.

--
-- Tom Mornini, CTO
-- Engine Yard, Ruby on Rails Hosting
-- Reliability, Ease of Use, Scalability
-- (866) 518-YARD (9273)


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

Prev by Date: Re: [Xen-users] Bonding dom0 or dom1 ? and why is vif1.1 missing in DomU?
Next by Date: Re: [Xen-users] Strange Networking Issue
Previous by thread: [Xen-users] Upgrade from 2GB RAM to 4GB RAM
Next by thread: Re: [Xen-users] Any largeish deployments of Xen? References?
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.