[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after crash

On Mon, 2007-04-30 at 09:35 +1200, Steve Wray wrote:
> Hi all,
> We have a xen instance (under xen 2.0.6) thats pretty unreliable; the
> domU crashes fairly regularly.

If you must use Xen v2, try 2.0.7 (or the last 2.0-testing Mercurial).
2.0.7 isn't the most feature packed release but it is extremely stable.

I'd really recommend upgrading to 3.0.4-testing or 3.0.5-testing (I
think its at rc4 now) unless you depend on an older kernel version. I
have some that have to stay at 2.0.7 until I find a better fit for PV
open SSI clusters.

> Yes, we are trying to figure out why, but in the meantime I discovered
> that there is a config option 'on_crash'.
> We've implemented this in the config file for that xen domain and we
> have this in the config file for the domain:
> restart = 'always'
> on_crash = 'restart'

This really depends on Xen's ability to see the dom-u as 'crashed'.
Typical 'crashes' on older kernels don't look much different to Xen than
a running or blocking state.

Examples would be, if its non responsive and shown as running, the guest
is most likely just spiraling out of control.

If its non responsive and blocking, any number of things could be going
wrong, but Xen doesn't see it. Unless its a full out kernel panic, most
likely Xen 2 won't see your guests crash.

Can you give more details of the crash?

> The domain has indeed crashed since this was implemented and did not
> appear to recover, at least not for the 6 minutes we gave it to restart
> the domain:
> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT> xend.domain.exit
> ['domUhostname', '14', 'crash']
> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT>
> xend.domain.destroy ['domUhostname', '14']
> [2007-04-30 09:06:20 xend] INFO (XendRoot:112) EVENT> xend.domain.died
> ['domUhostname', '14']
> [2007-04-30 09:12:03 xend] DEBUG (XendDomainInfo:720) init_domain>
> Created domain=15 name=domUhostname memory=1200
> [2007-04-30 09:12:03 xend] INFO (console:94) Created console id=14
> domain=15 port=9615

> And are there any other things we can do to restart a domain after a crash?

Many people favor some kind of key pairing to enable a centralized
monitor to be able to restart guests in the event of failure, even with
newer versions of Xen, or using the API.

If you aren't depending on a very specific older patched kernel, I'd
just move up to 3.0.4-testing. 3.0.5-testing has been pretty stable too.

Hope this helps,

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.