[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] xen, iscsi and resilience to short network outages


  • To: xen-users@xxxxxxxxxxxxxxxxxxx
  • From: "Steve Feehan" <sfeehan@xxxxxxxxx>
  • Date: Mon, 13 Nov 2006 14:12:10 -0500
  • Delivery-date: Mon, 13 Nov 2006 11:12:31 -0800
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=aYUrQi46cu8M5Ox3cAJvuWsyQ/t7pJDx8isnmE1b4oV44VWnpYZ+Uzk3TBQHrGfaih6hzgXxYEm1H/LMscFOnwAEJzsN2s+1FHdIPLjDUUBcKiD+fFquQn3PQ9hefbHUD/85T4pwA9lMX09qYwRYS5Y7hHtDcsZT1ONLap+l41k=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

On 11/10/06, Steve Feehan <sfeehan@xxxxxxxxx> wrote:
On 11/10/06, Steven Smith <sos22-xen@xxxxxxxxxxxxx> wrote:
> > INIT: cannot execute "/sbin/mingetty"
> > INIT: cannot execute "/sbin/mingetty"
> > INIT: cannot execute "/sbin/mingetty"
> > INIT: Id "1" respawning too fast: disabled for 5 minutes
> What happens after the five minutes is up?
>
> Steven.

Well, I've never waited 5 minutes to find out. But it was an hour
after the original IO timeout before I tried to login at the console
and saw these messages. And during this time I was not able to ssh
into the VM (but it was still pingable).

So I will force a timeout on Monday and see what happens after 5
minutes. But my guess is that the system is not going to recover.

As I suspected, the system never recovers from the IO errors. Someone
pulled the switch again and from the guest (connected via ssh):

sfeehan@extlb1:~> ls
-bash: /bin/ls: Input/output error

So I login to dom0 and start a console:

sfeehan@egovxen1:~> sudo xm console extlb1


extlb1 login: root
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: cannot execute "/sbin/mingetty"
INIT: Id "1" respawning too fast: disabled for 5 minutes
INIT: no more processes left in this runlevel

So I wait >5 minutes, reconnect and get no response.

Would I stand a better chance connecting to the iSCSI LUN from domU
rather than from dom0? My thought is that since the dom0 is able to
reconnect to the LUN when the network returns, perhaps this would be
the case for domU as well?

Root on NFS is also an option, but ataoe is unfortunately a
non-starter since we've invested in a NetApp and have to use it. ;)

I was hoping to avoid complicated initrd configuration (which I think
will be required for root on NFS or direct iSCSI connection).

I'm still curious if there is a configurable timeout in the domU
kernel that will be a little more tolerant of network outages.

Thanks,

--
Steve Feehan

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.