[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] dom0 freezes under high IO load - HP ML150 G2


  • To: "Tom Mornini" <tmornini@xxxxxxxxxxxxxx>
  • From: TMC <tmciolek@xxxxxxxxx>
  • Date: Sat, 3 Mar 2007 21:24:24 +1100
  • Cc: xen-users@xxxxxxxxxxxxxxxxxxx, Daniel Mealha Cabrita <dancab@xxxxxxxxxxxx>
  • Delivery-date: Sat, 03 Mar 2007 02:23:50 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=mWeT0u2uCoIki9+02TqtHicvZDEnZ4xuEgXSpehJ9u7fOEclDLf5jg/BLu0EeJv3ei/KeLdGNkgc3nqkavoXl/pZyQWPJEtDIfMdRpD6fBw9eBmmwVwcBITcGZyRNHWPZfgUj/FaNcnJFmVHGrfJIET9qWPzI8I+rvGzKjpPR9E=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

I don't actually think testing in Dom0 is a good thing unless you have
some very wierd situations.

As far as I understand Xen Dom0, and i might be wrong, it behaves just
like a plain Linux kernel does, in that it will hog CPU, had direct
access to hardware (DomU normally do not), and there is a whole set of
other issues associated with being the "hardware driver holder" for
the hypervisor...

I think a more telling tets would be to run indetical setup with a
plain Linux kenrel and see what that does. maybe youre running into a
driver/FS bug ?


regards
Tomasz

On 03/03/07, Tom Mornini <tmornini@xxxxxxxxxxxxxx> wrote:
You don't believe that *testing* is a good thing?

I'm pretty sure the Xen documentation points out that Dom0 is not
particularly special, except that it is privileged to manipulate DomUs.

I'd love to hear other's opinions on this topic. Should Dom0 be
entirely free of disk I/O?

Now, I should make it clear, I'm a big supporter of Dom0 doing
essentially nothing. Yet, I'm also a supporter of not having
difficulty sleeping at night, afraid that Dom0 might write a block or
two to its own disks...

I have a job that runs every five minutes to grab CPU utilization,
and writes that to disk. That job doesn't cause destabilization. My
problem seems to be related to kernel SLAB corruption, which is why I
mentioned this as something to test, and made it clear that in my
case, it made the machine unstable.

--
-- Tom Mornini, CTO
-- Engine Yard, Ruby on Rails Hosting
-- Reliability, Ease of Use, Scalability
-- (866) 518-YARD (9273)

On Mar 2, 2007, at 9:58 PM, TMC wrote:

> I am not sure if testing like this in Dom0 is a good thing.  Dom0 is
> "special" and should not run any precesses that hit disk hard. thats
> the job for DomU
>
> Regards
> TMC
>
> On 03/03/07, Tom Mornini <tmornini@xxxxxxxxxxxxxx> wrote:
>> Hello Daniel.
>>
>> We've had similar problems, but have received very little feedback
>> from our machines.
>>
>> Our setup is also not entirely similar to yours. :-)
>>
>> Could you try something out?
>>
>> Set a cron job to run every 1 minute:
>>
>>    cat /proc/slabinfo >> /root/slabinfo.txt
>>
>> When we do this, our problem gets *much* worse. I'd love to know if
>> these are similar problems.
>>
>> --
>> -- Tom Mornini, CTO
>> -- Engine Yard, Ruby on Rails Hosting
>> -- Reliability, Ease of Use, Scalability
>> -- (866) 518-YARD (9273)
>>
>> On Mar 2, 2007, at 7:27 PM, Daniel Mealha Cabrita wrote:
>>
>> >
>> > hi there,
>> >
>> >
>> >       Does anyone have have suggestions on how to proceed in
>> this case?
>> >
>> >
>> >       I've experiencing dom0 (xen 3.0.3, xen-3.0.4 and 3.0.4-
>> testing)
>> > lockups under
>> > heavy disk load (testing under dom0 directly).
>> > The hardware is a HP ML150 G2 with a HP 4ch SATA fakeraid (OEM
>> > Adaptec 1420SA,
>> > sata_mv driver).
>> >       The machine does not respond to network, keyboard not
>> anything
>> > noticeable
>> > when it happens.
>> >
>> >       I've tried passing a number of parameters to kernel but no
>> > success. The ones
>> > below even make things worse, causing a CPU0 soft lockup during
>> boot:
>> > kernel = (hd0,0)/xen-3.0.4 dom0_mem=384M acpi=off noapic nolapic
>> >
>> >       The disks runs in Linux kernel RAID5. High load to/from an
>> > individual disk
>> > (the max an individual SATA HD can handle) does not cause any
>> problem.
>> >
>> >       Also, I've noticed that just after booting the machine
>> does not
>> > respond to
>> > pings nor anything from the network. If I locally ping from that to
>> > another
>> > host, the network starts working. OR, if I wait enough time
>> (several
>> > minutes), the machine's network starts to respond normally.
>> >
>> >       I've got no soft lockups logged in /var/log/messages. Nor
>> anything
>> > strange
>> > enough to call my attention.
>> >
>> >       The problem does not happen with a non-Xen kernel.
>> >       The machine firmware (BIOS if you like) is updated to the
>> latest
>> > version.
>> >       Disabling all the non-essential hardware (USB, serial/
>> parallel
>> > ports, IDE
>> > ports, powersaving etc) makes no difference.
>> >
>> >       When not under high disk load the machine seems stable, with
>> > several domU VMs
>> > running happily under it.
>> >
>> > --
>> >  Daniel Mealha Cabrita
>> >  Divisao de Suporte Tecnico
>> >  AINFO / Reitoria / UTFPR
>> >  http://www.utfpr.edu.br
>> >
>> > _______________________________________________
>> > Xen-users mailing list
>> > Xen-users@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-users
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-users
>>
>
>
> --
> GPG key fingerprint: 3883 B308 8256 2246 D3ED  A1FF 3A1D 0EAD 41C4
> C2F0
> GPG public key availabe on pgp.mit .edu keyserver




--
GPG key fingerprint: 3883 B308 8256 2246 D3ED  A1FF 3A1D 0EAD 41C4 C2F0
GPG public key availabe on pgp.mit .edu keyserver

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.