[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] vanilla linux and jumbo frames ? (for AOE)
I seem to have a problem getting packets >= 4096 bytes (jumbo frames) through to a current vanilla linux kernel. This seems to be important for AoE performance.I've been running AOE for quite a while with XEN, but write performance sucks, so I'm trying to get jumbo frames running... AFAIK, the old centos linux kernels don't support jumbo frames in the AOE initiator so I rebuilt one of my 64 bit machines as a 32 bit machine as 32 on 64 doesn't seem to work at least with the centos 5.1 version of xen. Anyhow, I ran into an issue. I can configure nice big MTUs ... and I learned to do it on all the interfaces on the bridge (will move it to another bridge later) so as to get the bridge MTU to show the desired target MTU (it seems to be the set to the lowest MTU of any of the devices)... and with some poking around it is working, I can see large icmp packets running back and forth... HOWEVER, I can not get ping packets larger than 4052 bytes to work. I get an error in the domU: net eth0: rx->offset: 0, size: 4294967295 ping -s 4052 works ping -s 4053 does not Looking for the source of that message it seems to be this line in drivers/net/xen-netfront.c ... (from the vanilla 2.6.24.3 kernel) if (unlikely(rx->status < 0 || rx->offset + rx->status > PAGE_SIZE)) { if (net_ratelimit()) dev_warn(dev, "rx->offset: %x, size: %u\n", rx->offset, rx->status); xennet_move_rx_slot(np, skb, ref); err = -EINVAL; goto next; } This seems to suggest that this version of netfront can't handle a packet bigger than 4096 bytes :( ethernet overhead is 14? gotta be at least 12 bytes + the type field (2?) IP overhead is: 20 or 24 bytes ICMP overhead is: 8 bytes so that's 42 at a minimum. 42+ 4052 = 4094 ... that's pretty darn close to 4096 this packet is received: 19:16:22.935917 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype IPv4 (0x0800), length 4094: (tos 0x0, ttl 64, id 7342, offset 0, flags [none], proto: ICMP (1), length: 4080) X > Y: ICMP echo reply, id 54824, seq 1, length 4060 this is apparently discarded: 19:16:43.677814 00:30:48:78:b2:3a > 00:16:3e:46:a3:d5, ethertype IPv4 (0x0800), length 4095: (tos 0x0, ttl 64, id 7343, offset 0, flags [none], proto: ICMP (1), length: 4081) X > Y: ICMP echo reply, id 61224, seq 1, length 4061 it gets all the way back to the domU and then that check in xen-netfront.c seems to throw it out :( 3072 byte AOE packets are good enough to speed up read I/O, but I think AoE desparately needs to write a page at a time, as all my stats programs (including the standard vmstat) imply that there is a huge amount of READ I/O happening on the target (vblade or qaoed) when I try to WRITE out a big file. I'm thinking this is a consequence of writing 3k to a 4k block... but it is just a guess. I do know that AoE is supposed to have reasonable performance when jumbo packets are working... with 3072 byte packets my read performance is more than 4 times faster than the write performance... but if every write() requires a read, then that isn't suprising. Any thoughts folks? Sorry, This post is definately too long. I suppose I can try booting the centos kernel and using an "aftermarket" aoe module.... but that is _not_ the solution I'd like to use. I'd rather use the vanilla kernel and compile it myself. Amongst other reasons that means I don't need modules and can use a generic initrd. -Tom p.s. (after some compilation, probing, messing around with the MTU's on the various MTUs...) yup, using the centos kernel with an"aftermarket" aoe module_does_ work... the V59 aoe module gives me real feedback on loading: aoe: e2.2: setting 7680 byte data frames and voila, write I/O is just writes! time ( dd if=/dev/zero of=/mnt/BIG2 bs=4096 count=262144; sync) real 0m14.079s hhmm, that seems fishy, that's a tich faster than the native drive speed (should be 60 MByte/s)... still 14 seconds is a ___ of a lot better than the 82 seconds I was seeing with a 1024 block size... and I suspect that the time difference is simply that the target hasn't flushed everything to disk yet (vbladedoesn't have logic to run in O_SYNC, qaoed does, but using it seems to cut performance back down to about 18 MByte/s ). Reads weren't bad (45 MByte/s vs 70, but writes at 12 vs 60 were pretty sad). OK, so that's a the centos/RH 5.1 kernel, with a custom aoe module, wonder if I can get that into the initrd so I can boot over AOE. While I'm here, does anyone have the udev rules that create the extra /dev/etherd files like rediscover, err, revalidate etc ? centos 5.1 doesn't like the one from the 2.6.24 kernel :( -Tom ---------------------------------------------------------------------- tbrown@xxxxxxxxxxxxx | How often I found where I should be going http://BareMetal.com/ | only by setting out for somewhere else. web hosting since '95 | -- R. Buckminster Fuller _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |