[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] vanilla linux and jumbo frames ? (for AOE)




I seem to have a problem getting packets >= 4096 bytes (jumbo
frames) through to a current vanilla linux kernel. This seems to
be important for AoE performance.

I've been running AOE for quite a while with XEN, but write performance sucks, so I'm trying to get jumbo frames running... AFAIK, the old centos linux kernels don't support jumbo frames in the AOE initiator so I rebuilt one of my 64 bit machines as a 32 bit machine as 32 on 64 doesn't seem to work at least with the centos 5.1 version of xen.

Anyhow, I ran into an issue. I can configure nice big MTUs ... and I learned to do it on all the interfaces on the bridge (will move it to another bridge later) so as to get the bridge MTU to show the desired target MTU (it seems to be the set to the lowest MTU of any of the devices)... and with some poking around it is working, I can see large icmp packets running back and forth...

HOWEVER, I can not get ping packets larger than 4052 bytes to work. I get an error in the domU:

   net eth0: rx->offset: 0, size: 4294967295

ping -s 4052 works
ping -s 4053 does not


Looking for the source of that message it seems to be this line
in drivers/net/xen-netfront.c ... (from the vanilla 2.6.24.3
kernel)

    if (unlikely(rx->status < 0 ||
                 rx->offset + rx->status > PAGE_SIZE)) {
            if (net_ratelimit())
                    dev_warn(dev, "rx->offset: %x, size: %u\n",
                             rx->offset, rx->status);
            xennet_move_rx_slot(np, skb, ref);
            err = -EINVAL;
            goto next;
    }

This seems to suggest that this version of netfront can't handle
a packet bigger than 4096 bytes :(

ethernet overhead is 14?
   gotta be at least 12 bytes + the type field (2?)
IP overhead is: 20 or 24 bytes
ICMP overhead is: 8 bytes

so that's 42 at a minimum.
42+ 4052 = 4094   ... that's pretty darn close to 4096

this packet is received:

   19:16:22.935917 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype
   IPv4 (0x0800), length 4094: (tos 0x0, ttl  64, id 7342, offset 0,
   flags [none], proto: ICMP (1), length: 4080) X >
   Y: ICMP echo reply, id 54824, seq 1, length 4060

this is apparently discarded:

   19:16:43.677814 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype
   IPv4 (0x0800), length 4095: (tos 0x0, ttl  64, id 7343, offset 0,
   flags [none], proto: ICMP (1), length: 4081) X >
   Y: ICMP echo reply, id 61224, seq 1, length 4061

it gets all the way back to the domU and then that check in
xen-netfront.c seems to throw it out :(

3072 byte AOE packets are good enough to speed up read I/O, but I
think AoE desparately needs to write a page at a time, as all my
stats programs (including the standard vmstat) imply that there
is a huge amount of READ I/O happening on the target (vblade or
qaoed) when I try to WRITE out a big file. I'm thinking this is a
consequence of writing 3k to a 4k block... but it is just a
guess. I do know that AoE is supposed to have reasonable
performance when jumbo packets are working... with 3072 byte
packets my read performance is more than 4 times faster than the
write performance... but if every write() requires a read, then
that isn't suprising.

Any thoughts folks? Sorry, This post is definately too long.

I suppose I can try booting the centos kernel and using an
"aftermarket" aoe module.... but that is _not_ the solution I'd
like to use. I'd rather use the vanilla kernel and compile it
myself. Amongst other reasons that means I don't need modules and
can use a generic initrd.

-Tom

p.s.

(after some compilation, probing, messing around with the MTU's
on the various MTUs...) yup, using the centos kernel with an
"aftermarket" aoe module_does_ work... the V59 aoe module gives me real feedback on loading:

   aoe: e2.2: setting 7680 byte data frames

and voila, write I/O is just writes!

   time ( dd if=/dev/zero of=/mnt/BIG2 bs=4096 count=262144; sync)
   real    0m14.079s

hhmm, that seems fishy, that's a tich faster than the native
drive speed (should be 60 MByte/s)... still 14 seconds is a ___
of a lot better than the 82 seconds I was seeing with a 1024
block size... and I suspect that the time difference is simply
that the target hasn't flushed everything to disk yet (vblade
doesn't have logic to run in O_SYNC, qaoed does, but using it seems to cut performance back down to about 18 MByte/s ).

Reads weren't bad (45 MByte/s vs 70, but writes at 12 vs 60 were
pretty sad).

OK, so that's a the centos/RH 5.1 kernel, with a custom aoe
module, wonder if I can get that into the initrd so I can boot
over AOE.

While I'm here, does anyone have the udev rules that create the
extra /dev/etherd files like rediscover, err, revalidate etc ?

centos 5.1 doesn't like the one from the 2.6.24 kernel :(


-Tom



----------------------------------------------------------------------
tbrown@xxxxxxxxxxxxx   | How often I found where I should be going
http://BareMetal.com/  | only by setting out for somewhere else.
web hosting since '95  | -- R. Buckminster Fuller


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.