[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] xen / AOE / vblade




I ran into an interesting/strange issue today. I still don't understand what happened, but in know what fixed it.

I had a situation where I could only see vblade exported devices from OFF the physical machine. It seemed that if the packets went through two ports of the bridge (instead of one port and the real interface) they got "lost".

I sniffed around and failed to figure out what was going on. As soon as vblade fired up I started seeing the infamous "peth0:
received packet with  own address as source address" messages on
dom0. I chased that for a bit, but didn't get anywhere.

So I read the ATA spec and looked at the vblade code. I could see
that the vblade server was getting some packets, even if they did
have the bridge MAC as the source, but it was not responding to
them. They looked valid from tcpdump, so I started adding debug
statements to the vblade server.

It turns out that for some reason the packets were shorter
than vblade expected, and it was ignoring them. I changed the
check for packet length to be if < 32 instead of if < 60, and
voila it works.  (in aoe.c)

        for (;;) {
                n = getpkt(sfd, buf, bufsz);
                if (n < 0) {
                        perror("read network");
                        exit(1);
                }
//              if (n < 60) {
                if (n < 32) {
//                      fprintf(stderr,"skipping short read (%d<36)\n",n);
                        continue;
                }

I've got two identical systems, and why a given dom0 could only
see the vblade server in a domU on the other physical machine is beyond me.
I'm not a linux ethernet bridging expert, nor do I know why that
60 byte check was in the code... but I was certainly getting
shorter packets.... e.g.

21:58:48.408750 fe:ff:ff:ff:ff:ff > 00:16:3e:23:f7:0b, ethertype
Unknown (0x88a2), length 36:
0x0000:  10 00 0002 0100 0957 db28 0000 01ec 0000 .......W.(......
0x0010:  00a0 0000 0000

I've beat on it fairly hard since, and vblade on top of a drbd
"partition" seems to be working well.

If it helps, this is a pair of x86_64 systems, xen-3.0.3-0, one
a pentium-D and the other a dual amd 2216.

-Tom



----------------------------------------------------------------------
tbrown@xxxxxxxxxxxxx   | Courage is doing what you're afraid to do.
http://BareMetal.com/  | There can be no courage unless you're scared.
                       | - Eddie Rickenbacker

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.