[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mirage-www



Right, problem solved: TCP checksum on xen was slightly broken. There
was a fix made to Unix to cope with odd-length packets, this needed to
be applied to xen as well. Many of the packets had valid checksums,
and linux would ACK up to the sequence number in the last one of
those.

Now that the TCP bug is squashed, I can get back to writing my blog
post -- all just a day in the life of a mirage hacker :-)

On Sat, Sep 8, 2012 at 9:06 AM, Dave Scott <Dave.Scott@xxxxxxxxxxxxx> wrote:
>
>
> On Sep 7, 2012, at 5:59 PM, "Anil Madhavapeddy" <anil@xxxxxxxxxx> wrote:
>
>> On Fri, Sep 07, 2012 at 02:02:45PM +0100, David Scott wrote:
>>> Hi,
>>>
>>> I've built mirage-www for xen and modified the static IP to fit my
>>> environment. (I tried DHCP first but this didn't work -- if I get the
>>> time I'll try to debug).
>>>
>>> I can ping the server fine, and it's certainly receiving a lot of
>>> traffic on my (probably fairly busy) local network. When I try to
>>> fetch a URL the TCP connection hangs. On the console I get:
>>>
>>> Dispatch: dynamic URL /
>>> ... irrelevant spam
>>> TCP retransmission on timer seq = -889321980
>>>
>>> I've attached a small tcpdump of the conversation. I started with ping
>>> and then tried HTTP. According to tcpdump/wireshark it went like this:
>>>
>>> $ tcpdump -r mirage.pcap -n
>>> reading from file mirage.pcap, link-type EN10MB (Ethernet)
>>> 13:47:07.543119 IP 10.80.2.32 > 10.80.239.140: ICMP echo request, id
>>> 9300, seq 1, length 64
>>> 13:47:07.543756 IP 10.80.239.140 > 10.80.2.32: ICMP echo reply, id
>>> 9300, seq 1, length 64
>>> 13:47:08.542112 IP 10.80.2.32 > 10.80.239.140: ICMP echo request, id
>>> 9300, seq 2, length 64
>>> 13:47:08.542422 IP 10.80.239.140 > 10.80.2.32: ICMP echo reply, id
>>> 9300, seq 2, length 64
>>> 13:47:09.541288 IP 10.80.2.32 > 10.80.239.140: ICMP echo request, id
>>> 9300, seq 3, length 64
>>> 13:47:09.541609 IP 10.80.239.140 > 10.80.2.32: ICMP echo reply, id
>>> 9300, seq 3, length 64
>>> 13:47:10.541286 IP 10.80.2.32 > 10.80.239.140: ICMP echo request, id
>>> 9300, seq 4, length 64
>>> 13:47:10.541580 IP 10.80.239.140 > 10.80.2.32: ICMP echo reply, id
>>> 9300, seq 4, length 64
>>> 13:47:11.541286 IP 10.80.2.32 > 10.80.239.140: ICMP echo request, id
>>> 9300, seq 5, length 64
>>> 13:47:11.541932 IP 10.80.239.140 > 10.80.2.32: ICMP echo reply, id
>>> 9300, seq 5, length 64
>>>
>>> -- so far so good, this is just my initial pings. Switching to 'wget
>>> http://10.80.239.140/'
>>>
>>> 13:47:14.241216 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [S], seq
>>> 2284582709, win 5840, options [mss 1460,sackOK,TS val 909789846 ecr
>>> 0,nop,wscale 6], length 0
>>> 13:47:14.242365 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [S.],
>>> seq 3536243828, ack 2284582710, win 65535, options [mss 1380,wscale
>>> 2,eol], length 0
>>> 13:47:14.242387 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 1, win 92, length 0
>>>
>>> -- TCP connection established
>>>
>>> 13:47:14.242417 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [P.],
>>> seq 1:112, ack 1, win 92, length 111
>>>
>>> -- HTTP GET / sent
>>>
>>> 13:47:14.242869 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 1:18, ack 112, win 65535, length 17
>>>
>>> -- HTTP/1.1 200 OK replied
>>>
>>> 13:47:14.242880 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 18, win 92, length 0
>>> 13:47:14.243369 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 18:68, ack 112, win 65535, length 50
>>>
>>> -- "content-length..." replied

--- and this one had the invalid checksum.

>>>
>>> 13:47:14.243556 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 68:1448, ack 112, win 65535, length 1380
>>>
>>> ... after this chunks of the blog post are transmitted
>>>
>>> 13:47:14.243562 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 18, win 92, length 0
>>> 13:47:14.243566 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 1448:2828, ack 112, win 65535, length 1380
>>> 13:47:14.243571 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 18, win 92, length 0
>>> 13:47:14.243575 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 2828:4164, ack 112, win 65535, length 1336
>>> 13:47:14.243579 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 18, win 92, length 0
>>> 13:47:14.243602 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 4164:5544, ack 112, win 65535, length 1380
>>> 13:47:14.243608 IP 10.80.2.32.37158 > 10.80.239.140.80: Flags [.], ack
>>> 18, win 92, length 0
>>> 13:47:14.243776 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 18:68, ack 112, win 65535, length 50
>>>
>>> -- fast retransmit of the "content-length" packet
>>>
>>> 13:47:18.242606 IP 10.80.239.140.80 > 10.80.2.32.37158: Flags [P.],
>>> seq 18:68, ack 112, win 65535, length 50
>>>
>>
>> So the server is receiving the HTTP body ACKs (i.e. after the
>> content-length) on the wire, but not passing them up to the TCP stack
>> for some reason. The stack believes they weren't ACKed, and is correctly
>> retransmitting (from its perspective).
>>
>> Balraj saw the same problem a couple of months ago, and tracked it down to
>> possible corruption of the body of packets.  Balraj, did you ever narrow
>> this down further?
>>
>> Iirc, the first ~50 bytes of some packets were overwritten near the very
>> start. Dave, could you look at the packet bodies and see if they all look
>> correct?  Unfortunately, I can't reproduce this on my setup (mirage-www
>> works fine).   What is the TCP client / browser you are using?
>
> I tried Linux chrome and when that failed, switched to wget.
>
> I'll investigate the packet corruption possibility first. I'll take the 
> scenic route: I noticed one of the cstruct examples is for pcap file parsing, 
> I'll try to add the ability to log the packets directly to a local mirage 
> block device. I wanted a similar thing for xenstore anyway.
>
> If that fails I'll resort to printfs :) the problem always happens after a 
> handful of packets so this should be easy.
>
> Cheers,
> Dave
>



-- 
Dave Scott



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.