[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)

> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Simon Hobson
> > Subject: Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)
> >
> > Getting somewhat off-topic, but I'm interested to know how AoE
> network
> > errors ? I assume there is some handshake to make sure packets were
> delivered,
> > rather than just "fire and forget" !
> The Linux aoe open-source driver from Coraid (with which I am the most
> familiar) implements a congestion avoidance and control algorithm,
> similar to TCP/IP.  If a response exceeds twice the average round-trip
> time plus 8 times the average deviation, the request is retransmitted
> (based on aoe6-75 sources, earlier sources may differ).
> What's interesting about aoe vs. TCP is that a round-trip measures
> network and disk latency, not just network latency.  A request request
> will send a request packet, after which the target performs a disk
> and returns a response packet with the disk sector contents.  A normal
> write request will send a request with the sector contents, upon which
> the target performs a disk write, and returns a status packet.  Disk
> latency is orders of magnitude greater than network, and more
> We see a RTT of 5-10ms typically under light usage.
> Upon heavy disk I/O, this time can vary upwards, possibly tenths of
> seconds, leading to apparent packet loss and an RTT adjustment by the
> driver.  So it's not uncommon for a target to receive and process a
> duplicate request, which is okay because each request is idempotent.
> Lossage of 0.1% to 0.2% is common in our environment, but this does
> have a significant impact overall on aoe performance.
> That said, the aoe protocol also supports an asynchronous write
> operation, which I suppose really is "fire and forget", unlike normal
> reads and writes.  I haven't used an aoe driver that implements
> asynchronous writes however, and I'm not sure I would if I had the
> option since you have no guarantee that the writes succeed.

Interesting stuff.

I use DRBD locally and used to regularly see messages about concurrent
outstanding requests to the same sector. DRBD logs this because it can't
guarantee the serialising of requests so two write requests to the same
sector might be reordered at any layer different between the two
servers. It sounds like AoE would make this even worse if the 'first'
write was lost resulting in the 'second' write being performed first
followed by the 'first' write. Now sensibly, you'd think that a barrier
would be placed between the first and second writes guaranteeing that
nothing would be reordered across the barrier, but if you run Windows on
Xen on AoE on DRBD (eg to a HA DRBD SAN), you might see non-sensible
things happen. To be fair, in my testing the writes that Windows
performed were always the same data so there were no adverse
consequences but it's still annoying.

I modified GPLPV to check the pipeline and to stall if an overlapping
write request would be sent (it happens very rarely so there is no
measurable performance impact), but it's a lot of mucking around just to
get rid of one little benign message.


Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.