[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after crash

Tim Post wrote:
> On Mon, 2007-04-30 at 09:35 +1200, Steve Wray wrote:
>> Hi all,
>> We have a xen instance (under xen 2.0.6) thats pretty unreliable; the
>> domU crashes fairly regularly.
> If you must use Xen v2, try 2.0.7 (or the last 2.0-testing Mercurial).
> 2.0.7 isn't the most feature packed release but it is extremely stable.
> I'd really recommend upgrading to 3.0.4-testing or 3.0.5-testing (I
> think its at rc4 now) unless you depend on an older kernel version. I
> have some that have to stay at 2.0.7 until I find a better fit for PV
> open SSI clusters.

Unfortunately, for operational reasons, its a little difficult to change
the Xen version at this time. Definitely not to v3 but in the coming
month I should be able to try 2.0.7

> This really depends on Xen's ability to see the dom-u as 'crashed'.
> Typical 'crashes' on older kernels don't look much different to Xen than
> a running or blocking state.
> Examples would be, if its non responsive and shown as running, the guest
> is most likely just spiraling out of control.
> If its non responsive and blocking, any number of things could be going
> wrong, but Xen doesn't see it. Unless its a full out kernel panic, most
> likely Xen 2 won't see your guests crash.
> Can you give more details of the crash?

Not really; there are no log entries on neither the domU nor on the dom0
which give any idea as to what has happened.

Symptoms are that the domU is no longer running. The Xen log says just
what I included; that the domain had a 'crash' and that it 'died'. The
domU does not show up in 'xm list'. There appears to be no unusual load
spike or any other unusual activity prior to the 'crash'.

I'm a little surprised that when the log entry shows:

xend.domain.exit ['domUhostname', '14', 'crash']

Xen does not interpret this as a 'crash' relative to 'on_crash'


>> The domain has indeed crashed since this was implemented and did not
>> appear to recover, at least not for the 6 minutes we gave it to restart
>> the domain:
>> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT> xend.domain.exit
>> ['domUhostname', '14', 'crash']
>> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT>
>> xend.domain.destroy ['domUhostname', '14']
>> [2007-04-30 09:06:20 xend] INFO (XendRoot:112) EVENT> xend.domain.died
>> ['domUhostname', '14']
>> [2007-04-30 09:12:03 xend] DEBUG (XendDomainInfo:720) init_domain>
>> Created domain=15 name=domUhostname memory=1200
>> [2007-04-30 09:12:03 xend] INFO (console:94) Created console id=14
>> domain=15 port=9615

>> And are there any other things we can do to restart a domain after a crash?
> Many people favor some kind of key pairing to enable a centralized
> monitor to be able to restart guests in the event of failure, even with
> newer versions of Xen, or using the API.
> If you aren't depending on a very specific older patched kernel, I'd
> just move up to 3.0.4-testing. 3.0.5-testing has been pretty stable too.

Sadly, we are. There is a project underway to upgrade but that could be
months away.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.