[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] XCP Storage resiliency

To: xen-api@xxxxxxxxxxxxx
From: Nathan March <nathan@xxxxxx>
Date: Fri, 21 Jun 2013 15:55:33 -0700
Delivery-date: Fri, 21 Jun 2013 22:56:46 +0000
Domainkey-signature: a=rsa-sha1; c=nofws; d=gt.net; h=message-id:date :from:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=mail; b=aFri5u s5MT+GBArzieGdc6oqmCJhZWTjMYfnJ/EcsZHsyHjdRApJY/OamhlWDTeOLl7cNJ TOKSWx9/WjDr/q6aMyC9s3nfUkZhwD+FL0l3moJZ969GsKYwc22/+S5aB9kzO6tb qHVQEbNvSauojv170Q5GWcwzHfqMdsh4L/5M0=
List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

On 6/21/2013 1:16 AM, George Shuklin wrote:

I'm talking not about dom0, mostly, but domU kernel. If IO takes morethan 120 seconds, it will processed as 'io timeout'. And this timeoutis hardcoded (no /sys|/proc variables).
If you getting IO timeout in less than 2 minutes - that differentquestion.

Hi George,

Sorry if I'm misunderstanding, but I don't believe it's a domU issue, asI've run identical virtual machines on our existing xen cluster and cantake storage away from the dom0 for over 45 minutes without a problem.If the domU kernel was responsible for timing out the IO requests I'd beseeing some sort of kernel error on my domU's in this situation. Insteadthey just hang waiting for the IO and gracefully recover once it comesback (albeit, with very very high load averages as requests back up).I've done no patching/changes to our existing systems to get it to worklike this, it just ended up that way. We're running stock 3.2.28 dom0'sand 2.6.32.60 domU's, so having to hack a domU kernel on XCP to achievethe same thing seems strange?

That being said, it is a 120s timeout that I'm hitting (NFS is meechoing to kmsg when I pull connectivity for easy timestamp purposes)


dom0:
[ 2594.069594] NFS
[ 2609.574285] nfs: server 10.1.26.1 not responding, timed out
[ 2717.464716] end_request: I/O error, dev tda, sector 18882056

domu:
[82688.790260] NFS
[82812.678888] end_request: I/O error, dev xvda, sector 18882056

So here the dom0 is timing out and the I/O error is returned back to thedomU and then it goes read only.

If I manually unmount + remount the SR on the dom0 with "-o hard", Iwould expect the timeout to go away as nfs is no longer returning thetimeout back to xcp. Instead what I see are the same 120s timeouts,making me think that this timeout is coming from some other layer instead?


Thanks!

- Nathan



_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Follow-Ups:
- Re: [Xen-API] XCP Storage resiliency
  - From: George Shuklin

References:
- [Xen-API] XCP Storage resiliency
  - From: Nathan March
- Re: [Xen-API] XCP Storage resiliency
  - From: George Shuklin
- Re: [Xen-API] XCP Storage resiliency
  - From: Nathan March
- Re: [Xen-API] XCP Storage resiliency
  - From: George Shuklin

Prev by Date: [Xen-API] Jumbo frames + bonding not being configured properly on bootup
Next by Date: Re: [Xen-API] Jumbo frames + bonding not being configured properly on bootup
Previous by thread: Re: [Xen-API] XCP Storage resiliency
Next by thread: Re: [Xen-API] XCP Storage resiliency
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.