Xen project Mailing List

RE: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)

To: "Jeff Sturm" <jeff.sturm@xxxxxxxxxx>, "Simon Hobson" <linux@xxxxxxxxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>

From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>

Date: Fri, 28 Jan 2011 11:04:15 +1100

Cc:

Delivery-date: Thu, 27 Jan 2011 16:05:55 -0800

List-id: Xen user discussion <xen-users.lists.xensource.com>

Thread-index: Acu+O/c+dWGuTCElT7+OEMCVh5x8zQADpZqgAAzN5sA=

Thread-topic: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)

> > > -----Original Message----- > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users- > > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Simon Hobson > > Subject: Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs) > > > > Getting somewhat off-topic, but I'm interested to know how AoE handles > network > > errors ? I assume there is some handshake to make sure packets were > delivered, > > rather than just "fire and forget" ! > > The Linux aoe open-source driver from Coraid (with which I am the most > familiar) implements a congestion avoidance and control algorithm, > similar to TCP/IP. If a response exceeds twice the average round-trip > time plus 8 times the average deviation, the request is retransmitted > (based on aoe6-75 sources, earlier sources may differ). > > What's interesting about aoe vs. TCP is that a round-trip measures both > network and disk latency, not just network latency. A request request > will send a request packet, after which the target performs a disk read, > and returns a response packet with the disk sector contents. A normal > write request will send a request with the sector contents, upon which > the target performs a disk write, and returns a status packet. Disk > latency is orders of magnitude greater than network, and more variable. > We see a RTT of 5-10ms typically under light usage. > > Upon heavy disk I/O, this time can vary upwards, possibly tenths of > seconds, leading to apparent packet loss and an RTT adjustment by the > driver. So it's not uncommon for a target to receive and process a > duplicate request, which is okay because each request is idempotent. > > Lossage of 0.1% to 0.2% is common in our environment, but this does not > have a significant impact overall on aoe performance. > > That said, the aoe protocol also supports an asynchronous write > operation, which I suppose really is "fire and forget", unlike normal > reads and writes. I haven't used an aoe driver that implements > asynchronous writes however, and I'm not sure I would if I had the > option since you have no guarantee that the writes succeed. > Interesting stuff. I use DRBD locally and used to regularly see messages about concurrent outstanding requests to the same sector. DRBD logs this because it can't guarantee the serialising of requests so two write requests to the same sector might be reordered at any layer different between the two servers. It sounds like AoE would make this even worse if the 'first' write was lost resulting in the 'second' write being performed first followed by the 'first' write. Now sensibly, you'd think that a barrier would be placed between the first and second writes guaranteeing that nothing would be reordered across the barrier, but if you run Windows on Xen on AoE on DRBD (eg to a HA DRBD SAN), you might see non-sensible things happen. To be fair, in my testing the writes that Windows performed were always the same data so there were no adverse consequences but it's still annoying. I modified GPLPV to check the pipeline and to stall if an overlapping write request would be sent (it happens very rarely so there is no measurable performance impact), but it's a lot of mucking around just to get rid of one little benign message. James _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.