[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM
What are the rules about alignment for cstructs? Allocating the buffer like this works: let data = Io_page.get 1 |> Io_page.to_cstruct let () = Cstruct.blit_from_string (Buffer.contents buffer) 0 data 0 (Buffer.length buffer) But using Cstruct.of_string doesn't. It does look like Netif assumes the structs are page aligned. e.g. if page.Cstruct.off + len > page_size then begin (* netback rejects packets that cross page boundaries *) let msg = Printf.sprintf "Invalid page: offset=%d, length=%d" page.Cstruct.off len in print_endline msg; Lwt.fail (Failure msg) end else Perhaps the type should be Io_page rather than CStruct in that case? Or are CStructs supposed to be aligned too? On 21 July 2014 23:14, Thomas Leonard <talex5@xxxxxxxxx> wrote: > Didn't seem to help. Here's my current modified version of > mirage-skeleton/network. Does it work for anyone else? > > module Main (C: V1_LWT.CONSOLE) (S: V1_LWT.STACKV4) = struct > > let buffer = Buffer.create 1000 > let () = > for i = 1 to 1000 do > Buffer.add_string buffer (Printf.sprintf "%d " i) > done > > let data = Cstruct.of_string (Buffer.contents buffer) > let start c s = > S.listen_tcpv4 s ~port:8000 (fun flow -> > let dst, dst_port = S.TCPV4.get_dest flow in > C.log_s c (green "new tcp connection from %s %d" > (Ipaddr.V4.to_string dst) dst_port) > >>= fun () -> > S.TCPV4.write flow data > >>= fun () -> > S.TCPV4.close flow > ); > > S.listen s > > end > > I test using: > > $ nc 192.168.0.18 8000 > > When compiled for Unix, it outputs a sequence of numbers. On Xen, it > just keeps retransmitting. Wireshark shows the payload starts 16 bytes > later than expected. I'll sleep on it and see if it makes more sense > in the morning... > > > On 21 July 2014 22:40, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote: >> If it fails on x86 could you try reverting to an older Cstruct version? I >> worry about the recent bounds checks tripping some new behaviour. >> >> Anil >> >>> On 21 Jul 2014, at 14:30, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>> >>>> On 21 July 2014 21:31, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote: >>>> >>>>> On 21 Jul 2014, at 21:14, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>>>> >>>>>> On 21 July 2014 20:56, Richard Mortier >>>>>> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote: >>>>>> [ context for list: thomas' observation of failed download, and lots of >>>>>> retransmissions generally, while bringing up mirage-www on ARM ] >>>>>> >>>>>>> On 21 Jul 2014, at 09:27, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>>>>>> >>>>>>> On 21 July 2014 17:08, Richard Mortier >>>>>>> <Richard.Mortier@xxxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>>>>> On 21 Jul 2014, at 09:01, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> Here's the wireshark capture of a failed download. It does indeed say >>>>>>>>> the TCP checksum is wrong. Any idea what's going on? >>>>>>>>> >>>>>>>>> Note that on ARM it uses a different function to calculate this (which >>>>>>>>> I took from mirage-unix). It's in the #else block here: >>>>>>>>> >>>>>>>>> https://github.com/talex5/mirage-tcpip/blob/checksum/lib/checksum_stubs.c >>>>>>>> >>>>>>>> ack; will take a look after breakfast :) >>>>>>>> >>>>>>>> just to be clear -- the ARM version is using the code from L247 marked >>>>>>>> "generic implementation"? >>>>>>> >>>>>>> Yes. The x86 version crashes on ARM because the 64-bit values aren't >>>>>>> aligned. >>>>>>> >>>>>>>> two immediate questions -- is the checksum field definitely treated as >>>>>>>> all zeros in the computation across the header? and is the segment >>>>>>>> padded with zeros to be N*16 bits for the purposes of the computation >>>>>>>> (but the pad not transmitted)? >>>>>>> >>>>>>> No idea. I haven't changed any code around there. >>>>>> >>>>>> this is weird-- wireshark says that the first transmission of that >>>>>> segment (frame#13) has an invalid checksum while the retransmission >>>>>> (#17) has a valid checksum. but the two checksums are the same! however >>>>>> #13 appears to have almost no valid data in it -- after the first 74 >>>>>> bytes (which are the same in both #13 and #17), the payload in #13 is >>>>>> zeroed out. >>>>>> >>>>>> so i guess the cstruct buffer is being recycled too soon (after the >>>>>> checksum calculation but before the data is actually transmitted) or >>>>>> something? >>>>>> >>>>>> anil, balraj (or anyone else!)-- has that part of the stack been changed >>>>>> recently? >>>>> >>>>> I'm seeing strange things using a simpler test case now: >>>>> >>>>> let start c s = >>>>> S.listen_tcpv4 s ~port:8000 (fun flow -> >>>>> let dst, dst_port = S.TCPV4.get_dest flow in >>>>> C.log_s c (green "new tcp connection from %s %d" >>>>> (Ipaddr.V4.to_string dst) dst_port) >>>>>>> = fun () -> >>>>> let data = Cstruct.of_string "Hello" in >>>>> S.TCPV4.write flow data >>>>>>> = fun () -> >>>>> S.TCPV4.close flow >>>>> ); >>>>> S.listen s >>>>> >>>>> This is also failing. I added a hexdump to mirage-net-xen and got this >>>>> in Netif.writev: >>>>> >>>>> f0 1f af 6a 9b 95 c0 ff ee c0 ff ee 08 00 45 00 >>>>> 00 2d 52 95 00 00 26 06 c0 c8 c0 a8 00 12 c0 a8 >>>>> 00 0b 1f 40 b4 ca 1a fe b5 69 5e 8c dd fe 50 18 >>>>> ff ff 29 8a 00 00 >>>>> >>>>> 48 65 6c 6c 6f >>>>> >>>>> That looks correct. The first block is the header, the second is the >>>>> payload. In wireshare, the header is identical but the payload is >>>>> different (20 00 00 00 08), which matches what you're seeing. >>>>> >>>>> So I guess there's some problem sending the second page to the ring. >>>>> Suggestions from people who know this code would be great! Could just >>>>> be a missing barrier or something. >>>> >>>> I think the flow is: >>>> >>>> https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml#L408 >>>> https://github.com/mirage/shared-memory-ring/blob/master/lwt/lwt_ring.ml#L75 >>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L154 >>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/ring.ml#L102 >>>> https://github.com/mirage/shared-memory-ring/blob/master/lib/barrier_stubs.c#L28 >>>> â calling âxen_mbâ >>>> >>>> Perhaps to see whether âxen_mbâ is working you could add a delay (via busy >>>> loop?) in the âmemory_barrierâ function (or thereabouts) in >>>> shared-memory-ring. Assuming the writes are committed eventually (is that >>>> a valid assumption?) then the busy loop would âfix itâ. That would be >>>> fairly good evidence that barriers are broken. >>> >>> This is potentially a problem (in shared-memory-ring/lib/barrier.h): >>> >>> #elif defined(__arm__) >>> # ifndef _M_ARM >>> #define xen_mb() {} >>> #define xen_rmb() {} >>> #define xen_wmb() {} >>> # elif _M_ARM > 6 >>> #define xen_mb() asm volatile ("dmb" : : : "memory") >>> #define xen_rmb() asm volatile ("dmb" : : : "memory") >>> #define xen_wmb() asm volatile ("dmb" : : : "memory") >>> >>> From a quick Google, it looks like _M_ARM is a Microsoft-only thing. >>> >>> However, the barrier code is duplicated in mirage-xen, and I think >>> we're using that version (in any case, changing it didn't help). >>> >>> However, I've now noticed that my network test is failing on x86 too, >>> so there's something very odd going on... >>> >>> >>> -- >>> Dr Thomas Leonard http://0install.net/ >>> GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 >>> GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA >>> >>> _______________________________________________ >>> MirageOS-devel mailing list >>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx >>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel > > > > -- > Dr Thomas Leonard http://0install.net/ > GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 > GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA -- Dr Thomas Leonard http://0install.net/ GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |