[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM



Didn't seem to help. Here's my current modified version of
mirage-skeleton/network. Does it work for anyone else?

module Main (C: V1_LWT.CONSOLE) (S: V1_LWT.STACKV4) = struct

  let buffer = Buffer.create 1000
  let () =
    for i = 1 to 1000 do
      Buffer.add_string buffer (Printf.sprintf "%d " i)
    done

  let data = Cstruct.of_string (Buffer.contents buffer)
  let start c s =
    S.listen_tcpv4 s ~port:8000 (fun flow ->
        let dst, dst_port = S.TCPV4.get_dest flow in
        C.log_s c (green "new tcp connection from %s %d"
                     (Ipaddr.V4.to_string dst) dst_port)
        >>= fun () ->
        S.TCPV4.write flow data
        >>= fun () ->
        S.TCPV4.close flow
      );

    S.listen s

end

I test using:

$ nc 192.168.0.18 8000

When compiled for Unix, it outputs a sequence of numbers. On Xen, it
just keeps retransmitting. Wireshark shows the payload starts 16 bytes
later than expected. I'll sleep on it and see if it makes more sense
in the morning...


On 21 July 2014 22:40, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
> If it fails on x86 could you try reverting to an older Cstruct version? I 
> worry about the recent bounds checks tripping some new behaviour.
>
> Anil
>
>> On 21 Jul 2014, at 14:30, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>
>>> On 21 July 2014 21:31, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
>>>
>>>> On 21 Jul 2014, at 21:14, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>
>>>>> On 21 July 2014 20:56, Richard Mortier <Richard.Mortier@xxxxxxxxxxxxxxxx> 
>>>>> wrote:
>>>>> [ context for list: thomas' observation of failed download, and lots of 
>>>>> retransmissions generally, while bringing up mirage-www on ARM ]
>>>>>
>>>>>> On 21 Jul 2014, at 09:27, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On 21 July 2014 17:08, Richard Mortier 
>>>>>> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>>>> On 21 Jul 2014, at 09:01, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Here's the wireshark capture of a failed download. It does indeed say
>>>>>>>> the TCP checksum is wrong. Any idea what's going on?
>>>>>>>>
>>>>>>>> Note that on ARM it uses a different function to calculate this (which
>>>>>>>> I took from mirage-unix). It's in the #else block here:
>>>>>>>>
>>>>>>>> https://github.com/talex5/mirage-tcpip/blob/checksum/lib/checksum_stubs.c
>>>>>>>
>>>>>>> ack; will take a look after breakfast :)
>>>>>>>
>>>>>>> just to be clear -- the ARM version is using the code from L247 marked 
>>>>>>> "generic implementation"?
>>>>>>
>>>>>> Yes. The x86 version crashes on ARM because the 64-bit values aren't 
>>>>>> aligned.
>>>>>>
>>>>>>> two immediate questions -- is the checksum field definitely treated as 
>>>>>>> all zeros in the computation across the header?  and is the segment 
>>>>>>> padded with zeros to be N*16 bits for the purposes of the computation 
>>>>>>> (but the pad not transmitted)?
>>>>>>
>>>>>> No idea. I haven't changed any code around there.
>>>>>
>>>>> this is weird-- wireshark says that the first transmission of that 
>>>>> segment (frame#13) has an invalid checksum while the retransmission (#17) 
>>>>> has a valid checksum. but the two checksums are the same!  however #13 
>>>>> appears to have almost no valid data in it -- after the first 74 bytes 
>>>>> (which are the same in both #13 and #17), the payload in #13 is zeroed 
>>>>> out.
>>>>>
>>>>> so i guess the cstruct buffer is being recycled too soon (after the 
>>>>> checksum calculation but before the data is actually transmitted) or 
>>>>> something?
>>>>>
>>>>> anil, balraj (or anyone else!)-- has that part of the stack been changed 
>>>>> recently?
>>>>
>>>> I'm seeing strange things using a simpler test case now:
>>>>
>>>> let start c s =
>>>>   S.listen_tcpv4 s ~port:8000 (fun flow ->
>>>>       let dst, dst_port = S.TCPV4.get_dest flow in
>>>>       C.log_s c (green "new tcp connection from %s %d"
>>>>                    (Ipaddr.V4.to_string dst) dst_port)
>>>>>> = fun () ->
>>>>       let data = Cstruct.of_string "Hello" in
>>>>       S.TCPV4.write flow data
>>>>>> = fun () ->
>>>>       S.TCPV4.close flow
>>>>     );
>>>>   S.listen s
>>>>
>>>> This is also failing. I added a hexdump to mirage-net-xen and got this
>>>> in Netif.writev:
>>>>
>>>> f0 1f af 6a 9b 95 c0 ff ee c0 ff ee 08 00 45 00
>>>> 00 2d 52 95 00 00 26 06 c0 c8 c0 a8 00 12 c0 a8
>>>> 00 0b 1f 40 b4 ca 1a fe b5 69 5e 8c dd fe 50 18
>>>> ff ff 29 8a 00 00
>>>>
>>>> 48 65 6c 6c 6f
>>>>
>>>> That looks correct. The first block is the header, the second is the
>>>> payload. In wireshare, the header is identical but the payload is
>>>> different (20 00 00 00 08), which matches what you're seeing.
>>>>
>>>> So I guess there's some problem sending the second page to the ring.
>>>> Suggestions from people who know this code would be great! Could just
>>>> be a missing barrier or something.
>>>
>>> I think the flow is:
>>>
>>> https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml#L408
>>> https://github.com/mirage/shared-memory-ring/blob/master/lwt/lwt_ring.ml#L75
>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L154
>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L102
>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/barrier_stubs.c#L28
>>> â calling âxen_mbâ
>>>
>>> Perhaps to see whether âxen_mbâ is working you could add a delay (via busy 
>>> loop?) in the âmemory_barrierâ function (or thereabouts) in 
>>> shared-memory-ring. Assuming the writes are committed eventually (is that a 
>>> valid assumption?) then the busy loop would âfix itâ. That would be fairly 
>>> good evidence that barriers are broken.
>>
>> This is potentially a problem (in shared-memory-ring/lib/barrier.h):
>>
>> #elif defined(__arm__)
>> # ifndef _M_ARM
>> #define xen_mb()   {}
>> #define xen_rmb()  {}
>> #define xen_wmb()  {}
>> # elif _M_ARM > 6
>> #define xen_mb()   asm volatile ("dmb" : : : "memory")
>> #define xen_rmb()  asm volatile ("dmb" : : : "memory")
>> #define xen_wmb()  asm volatile ("dmb" : : : "memory")
>>
>> From a quick Google, it looks like _M_ARM is a Microsoft-only thing.
>>
>> However, the barrier code is duplicated in mirage-xen, and I think
>> we're using that version (in any case, changing it didn't help).
>>
>> However, I've now noticed that my network test is failing on x86 too,
>> so there's something very odd going on...
>>
>>
>> --
>> Dr Thomas Leonard        http://0install.net/
>> GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
>> GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA
>>
>> _______________________________________________
>> MirageOS-devel mailing list
>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.