[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM



What are the rules about alignment for cstructs?

Allocating the buffer like this works:

  let data = Io_page.get 1 |> Io_page.to_cstruct
  let () = Cstruct.blit_from_string (Buffer.contents buffer) 0 data 0
(Buffer.length buffer)

But using Cstruct.of_string doesn't. It does look like Netif assumes
the structs are page aligned. e.g.

  if page.Cstruct.off + len > page_size then begin
    (* netback rejects packets that cross page boundaries *)
    let msg =
      Printf.sprintf "Invalid page: offset=%d, length=%d"
page.Cstruct.off len in
    print_endline msg;
    Lwt.fail (Failure msg)
  end else

Perhaps the type should be Io_page rather than CStruct in that case?
Or are CStructs supposed to be aligned too?


On 21 July 2014 23:14, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> Didn't seem to help. Here's my current modified version of
> mirage-skeleton/network. Does it work for anyone else?
>
> module Main (C: V1_LWT.CONSOLE) (S: V1_LWT.STACKV4) = struct
>
>   let buffer = Buffer.create 1000
>   let () =
>     for i = 1 to 1000 do
>       Buffer.add_string buffer (Printf.sprintf "%d " i)
>     done
>
>   let data = Cstruct.of_string (Buffer.contents buffer)
>   let start c s =
>     S.listen_tcpv4 s ~port:8000 (fun flow ->
>         let dst, dst_port = S.TCPV4.get_dest flow in
>         C.log_s c (green "new tcp connection from %s %d"
>                      (Ipaddr.V4.to_string dst) dst_port)
>         >>= fun () ->
>         S.TCPV4.write flow data
>         >>= fun () ->
>         S.TCPV4.close flow
>       );
>
>     S.listen s
>
> end
>
> I test using:
>
> $ nc 192.168.0.18 8000
>
> When compiled for Unix, it outputs a sequence of numbers. On Xen, it
> just keeps retransmitting. Wireshark shows the payload starts 16 bytes
> later than expected. I'll sleep on it and see if it makes more sense
> in the morning...
>
>
> On 21 July 2014 22:40, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>> If it fails on x86 could you try reverting to an older Cstruct version? I 
>> worry about the recent bounds checks tripping some new behaviour.
>>
>> Anil
>>
>>> On 21 Jul 2014, at 14:30, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>
>>>> On 21 July 2014 21:31, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
>>>>
>>>>> On 21 Jul 2014, at 21:14, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>
>>>>>> On 21 July 2014 20:56, Richard Mortier 
>>>>>> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>>>>>> [ context for list: thomas' observation of failed download, and lots of 
>>>>>> retransmissions generally, while bringing up mirage-www on ARM ]
>>>>>>
>>>>>>> On 21 Jul 2014, at 09:27, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On 21 July 2014 17:08, Richard Mortier 
>>>>>>> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>>>> On 21 Jul 2014, at 09:01, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Here's the wireshark capture of a failed download. It does indeed say
>>>>>>>>> the TCP checksum is wrong. Any idea what's going on?
>>>>>>>>>
>>>>>>>>> Note that on ARM it uses a different function to calculate this (which
>>>>>>>>> I took from mirage-unix). It's in the #else block here:
>>>>>>>>>
>>>>>>>>> https://github.com/talex5/mirage-tcpip/blob/checksum/lib/checksum_stubs.c
>>>>>>>>
>>>>>>>> ack; will take a look after breakfast :)
>>>>>>>>
>>>>>>>> just to be clear -- the ARM version is using the code from L247 marked 
>>>>>>>> "generic implementation"?
>>>>>>>
>>>>>>> Yes. The x86 version crashes on ARM because the 64-bit values aren't 
>>>>>>> aligned.
>>>>>>>
>>>>>>>> two immediate questions -- is the checksum field definitely treated as 
>>>>>>>> all zeros in the computation across the header?  and is the segment 
>>>>>>>> padded with zeros to be N*16 bits for the purposes of the computation 
>>>>>>>> (but the pad not transmitted)?
>>>>>>>
>>>>>>> No idea. I haven't changed any code around there.
>>>>>>
>>>>>> this is weird-- wireshark says that the first transmission of that 
>>>>>> segment (frame#13) has an invalid checksum while the retransmission 
>>>>>> (#17) has a valid checksum. but the two checksums are the same!  however 
>>>>>> #13 appears to have almost no valid data in it -- after the first 74 
>>>>>> bytes (which are the same in both #13 and #17), the payload in #13 is 
>>>>>> zeroed out.
>>>>>>
>>>>>> so i guess the cstruct buffer is being recycled too soon (after the 
>>>>>> checksum calculation but before the data is actually transmitted) or 
>>>>>> something?
>>>>>>
>>>>>> anil, balraj (or anyone else!)-- has that part of the stack been changed 
>>>>>> recently?
>>>>>
>>>>> I'm seeing strange things using a simpler test case now:
>>>>>
>>>>> let start c s =
>>>>>   S.listen_tcpv4 s ~port:8000 (fun flow ->
>>>>>       let dst, dst_port = S.TCPV4.get_dest flow in
>>>>>       C.log_s c (green "new tcp connection from %s %d"
>>>>>                    (Ipaddr.V4.to_string dst) dst_port)
>>>>>>> = fun () ->
>>>>>       let data = Cstruct.of_string "Hello" in
>>>>>       S.TCPV4.write flow data
>>>>>>> = fun () ->
>>>>>       S.TCPV4.close flow
>>>>>     );
>>>>>   S.listen s
>>>>>
>>>>> This is also failing. I added a hexdump to mirage-net-xen and got this
>>>>> in Netif.writev:
>>>>>
>>>>> f0 1f af 6a 9b 95 c0 ff ee c0 ff ee 08 00 45 00
>>>>> 00 2d 52 95 00 00 26 06 c0 c8 c0 a8 00 12 c0 a8
>>>>> 00 0b 1f 40 b4 ca 1a fe b5 69 5e 8c dd fe 50 18
>>>>> ff ff 29 8a 00 00
>>>>>
>>>>> 48 65 6c 6c 6f
>>>>>
>>>>> That looks correct. The first block is the header, the second is the
>>>>> payload. In wireshare, the header is identical but the payload is
>>>>> different (20 00 00 00 08), which matches what you're seeing.
>>>>>
>>>>> So I guess there's some problem sending the second page to the ring.
>>>>> Suggestions from people who know this code would be great! Could just
>>>>> be a missing barrier or something.
>>>>
>>>> I think the flow is:
>>>>
>>>> https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml#L408
>>>> https://github.com/mirage/shared-memory-ring/blob/master/lwt/lwt_ring.ml#L75
>>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L154
>>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L102
>>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/barrier_stubs.c#L28
>>>> â calling âxen_mbâ
>>>>
>>>> Perhaps to see whether âxen_mbâ is working you could add a delay (via busy 
>>>> loop?) in the âmemory_barrierâ function (or thereabouts) in 
>>>> shared-memory-ring. Assuming the writes are committed eventually (is that 
>>>> a valid assumption?) then the busy loop would âfix itâ. That would be 
>>>> fairly good evidence that barriers are broken.
>>>
>>> This is potentially a problem (in shared-memory-ring/lib/barrier.h):
>>>
>>> #elif defined(__arm__)
>>> # ifndef _M_ARM
>>> #define xen_mb()   {}
>>> #define xen_rmb()  {}
>>> #define xen_wmb()  {}
>>> # elif _M_ARM > 6
>>> #define xen_mb()   asm volatile ("dmb" : : : "memory")
>>> #define xen_rmb()  asm volatile ("dmb" : : : "memory")
>>> #define xen_wmb()  asm volatile ("dmb" : : : "memory")
>>>
>>> From a quick Google, it looks like _M_ARM is a Microsoft-only thing.
>>>
>>> However, the barrier code is duplicated in mirage-xen, and I think
>>> we're using that version (in any case, changing it didn't help).
>>>
>>> However, I've now noticed that my network test is failing on x86 too,
>>> so there's something very odd going on...
>>>
>>>
>>> --
>>> Dr Thomas Leonard        http://0install.net/
>>> GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
>>> GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA
>>>
>>> _______________________________________________
>>> MirageOS-devel mailing list
>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
>
>
>
> --
> Dr Thomas Leonard        http://0install.net/
> GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
> GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.