[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM



On 21 Jul 2014, at 21:14, Thomas Leonard <talex5@xxxxxxxxx> wrote:

> On 21 July 2014 20:56, Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx> 
> wrote:
>> [ context for list: thomas' observation of failed download, and lots of 
>> retransmissions generally, while bringing up mirage-www on ARM ]
>> 
>> On 21 Jul 2014, at 09:27, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>> 
>>> On 21 July 2014 17:08, Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx> 
>>> wrote:
>>> 
>> 
>>>> On 21 Jul 2014, at 09:01, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>> 
>>>>> Here's the wireshark capture of a failed download. It does indeed say
>>>>> the TCP checksum is wrong. Any idea what's going on?
>>>>> 
>>>>> Note that on ARM it uses a different function to calculate this (which
>>>>> I took from mirage-unix). It's in the #else block here:
>>>>> 
>>>>> https://github.com/talex5/mirage-tcpip/blob/checksum/lib/checksum_stubs.c
>>>> 
>>>> ack; will take a look after breakfast :)
>>>> 
>>>> just to be clear -- the ARM version is using the code from L247 marked 
>>>> "generic implementation"?
>>> 
>>> Yes. The x86 version crashes on ARM because the 64-bit values aren't 
>>> aligned.
>>> 
>>>> two immediate questions -- is the checksum field definitely treated as all 
>>>> zeros in the computation across the header?  and is the segment padded 
>>>> with zeros to be N*16 bits for the purposes of the computation (but the 
>>>> pad not transmitted)?
>>> 
>>> No idea. I haven't changed any code around there.
>> 
>> this is weird-- wireshark says that the first transmission of that segment 
>> (frame#13) has an invalid checksum while the retransmission (#17) has a 
>> valid checksum. but the two checksums are the same!  however #13 appears to 
>> have almost no valid data in it -- after the first 74 bytes (which are the 
>> same in both #13 and #17), the payload in #13 is zeroed out.
>> 
>> so i guess the cstruct buffer is being recycled too soon (after the checksum 
>> calculation but before the data is actually transmitted) or something?
>> 
>> anil, balraj (or anyone else!)-- has that part of the stack been changed 
>> recently?
> 
> I'm seeing strange things using a simpler test case now:
> 
>  let start c s =
>    S.listen_tcpv4 s ~port:8000 (fun flow ->
>        let dst, dst_port = S.TCPV4.get_dest flow in
>        C.log_s c (green "new tcp connection from %s %d"
>                     (Ipaddr.V4.to_string dst) dst_port)
>>> = fun () ->
>        let data = Cstruct.of_string "Hello" in
>        S.TCPV4.write flow data
>>> = fun () ->
>        S.TCPV4.close flow
>      );
>    S.listen s
> 
> This is also failing. I added a hexdump to mirage-net-xen and got this
> in Netif.writev:
> 
> f0 1f af 6a 9b 95 c0 ff ee c0 ff ee 08 00 45 00
> 00 2d 52 95 00 00 26 06 c0 c8 c0 a8 00 12 c0 a8
> 00 0b 1f 40 b4 ca 1a fe b5 69 5e 8c dd fe 50 18
> ff ff 29 8a 00 00
> 
> 48 65 6c 6c 6f
> 
> That looks correct. The first block is the header, the second is the
> payload. In wireshare, the header is identical but the payload is
> different (20 00 00 00 08), which matches what you're seeing.
> 
> So I guess there's some problem sending the second page to the ring.
> Suggestions from people who know this code would be great! Could just
> be a missing barrier or something.

I think the flow is:

https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml#L408
https://github.com/mirage/shared-memory-ring/blob/master/lwt/lwt_ring.ml#L75
https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L154
https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L102
https://github.com/mirage/shared-memory-ring/blob/master/lib/barrier_stubs.c#L28
 — calling “xen_mb”

Perhaps to see whether “xen_mb” is working you could add a delay (via busy 
loop?) in the ‘memory_barrier’ function (or thereabouts) in shared-memory-ring. 
Assuming the writes are committed eventually (is that a valid assumption?) then 
the busy loop would “fix it”. That would be fairly good evidence that barriers 
are broken.

Cheers,
Dave
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.