[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM



On 22 July 2014 08:44, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 22 July 2014 04:00, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>> On 21 Jul 2014, at 16:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>
>>> What are the rules about alignment for cstructs?
>>>
>>> Allocating the buffer like this works:
>>>
>>>  let data = Io_page.get 1 |> Io_page.to_cstruct
>>>  let () = Cstruct.blit_from_string (Buffer.contents buffer) 0 data 0
>>> (Buffer.length buffer)
>>>
>>> But using Cstruct.of_string doesn't. It does look like Netif assumes
>>> the structs are page aligned. e.g.
>>>
>>>  if page.Cstruct.off + len > page_size then begin
>>>    (* netback rejects packets that cross page boundaries *)
>>>    let msg =
>>>      Printf.sprintf "Invalid page: offset=%d, length=%d"
>>> page.Cstruct.off len in
>>>    print_endline msg;
>>>    Lwt.fail (Failure msg)
>>>  end else
>>
>> Argh, this is exactly the problem.  Netif requires page aligned buffers
>> (in theory, sub-page grants are possible, but ill-advised for performance
>> reasons).
>>
>> Cstruct.of_string calls Cstruct.create, which calls Bigarray.Array1.create
>> which isn't page-aligned.  It does need to go through Io_page to ensure it's
>> page-aligned.
>>
>> We don't protect this distinction using phantom types, and it's bitten us
>> several times now through these hard-to-spot dynamic failures :-/
>>
>> Making Cstruct's always page aligned is too expensive for the 'casual'
>> small Cstructs (as seen in OCaml TLS for example), so ensure your buffer
>> originates from Io_page is the best bet for now.  An issue on how to
>> ensure this is checked statically would be good to have (but involve
>> a fair bit of mechanical code motion).
>
> OK, that helps.
>
> On ARM, it still sometimes needs to retransmit though. In some
> packets, only the first 128 bytes are correct and the rest are zero
> (according to wireshark). The buffer I'm sending is statically
> allocated, so it can't be getting freed too soon. How can the net
> driver see zeros in the same physical page in which it already saw the
> first part of the data?
>
> My test repository is here:
>
>   https://github.com/talex5/net-problem

I noticed I'm getting a lot of these in the dom0 log (dmsg):

Jul 22 08:09:05 cubietruck kernel: [11892.128241]
xen_add_phys_to_mach_entry: cannot add pfn=0x00079abe ->
mfn=0x000bed87: pfn=0x000799a6 -> mfn=0x000bed87 already exists
Jul 22 08:09:05 cubietruck kernel: [11892.128727]
xen_add_mach_to_phys_entry: cannot add pfn=0x00079bcb ->
mfn=0x000bed87: pfn=0x000799a6 -> mfn=0x000bed87 already exists
Jul 22 08:09:05 cubietruck kernel: [11892.133264]
xen_add_phys_to_mach_entry: cannot add pfn=0x00079b70 ->
mfn=0x000bed87: pfn=0x00079bd7 -> mfn=0x000bed87 already exists
Jul 22 08:09:09 cubietruck kernel: [11896.041106]
xen_add_phys_to_mach_entry: cannot add pfn=0x0007999a ->
mfn=0x000bed87: pfn=0x00079b2b -> mfn=0x000bed87 already exists



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.