[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

  • To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
  • Date: Wed, 31 Dec 2008 15:51:49 +1100
  • Cc:
  • Delivery-date: Tue, 30 Dec 2008 20:52:22 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: Aclq8extx1AIvKo+Ql6WikxbNr/6rgAAnnFwAAA7aNAAAFVNQAAAjkcQAAD8FiAAAWb/8A==
  • Thread-topic: [Xen-devel] RE: freezing when using GPLPV drivers (including Dom0)

> >From: James Harper [mailto:james.harper@xxxxxxxxxxxxxxxx]
> >Sent: Wednesday, December 31, 2008 11:37 AM
> >> >Is there a way under Linux of monitoring disk queue length? I am
> >using
> >> >LVM on top of a low end HP 'Smart Array' (E200) running two RAID1
> >> >volumes using SATA disks.
> >> >
> >>
> >> 'sar' could provide such info, IMO.
> >>
> >
> >iostat shows very very low disk usage when things are frozen. I am
> >finding that I can type 'sync' and things will unfreeze again...
> >unfreezing before the sync completes. I haven't done this enough
> >to know if things would have unfrozen on their own though.
> >
> That looks interesting. Now both cpu/disk utilizations are low, but
> is not responsive for unknown time... Does time in dom0 look sane? I
> guess you may have to check behavior/statistics of fe/be drivers in
> e.g. event count/s, whether kernel thread is waken effectively, how
> requests handled per event notification, etc. and then may judge
> those stats are expected.

I have written a script that does 'sync ; sleep 5' in a loop. My restore
is now at 20G and still going. I'll follow up if it completes.

I'm not sure where to look for this problem though... When I use the
qemu emulated devices instead of GPLPV, the restore runs to completion,
but it also runs slower, so maybe the problem isn't the GPLPV drivers
but more that the qemu drivers can't get the i/o load up high enough to
see the problem.

As I said earlier in the thread, the system is using a HP E200 'Smart'
array controller, with no battery backup, and 2 pairs of RAID1 arrays on
SATA disks. Obviously not the highest performing setup ever.

I have a 500G disk I can attach to one of the onboard SATA ports, but
I'm not sure that that will actually prove anything either way.

One other thing I didn't mention - I am using sparse files as my disk
images, using 'file:' under Xen. Again, not the highest performing
configuration, but the restore process we are using needs to see disks
at least as big as those that were backed up originally, and I just
don't have 2TB of disk lying around! The data access is DomU -> blkback
-> /dev/loopX -> file(sparse) -> filesystem(xfs) -> LVM -> E200...
that's a lot of room for stuff to go wrong in isn't it?

I could try switching to tap:aio but I don't think that my GPLPV drivers
work in that configuration... maybe time to find out why :)



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.