[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] wireshark capture of failed download from mirage-www on ARM



On 7 August 2014 13:29, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
> Hi,
>
> Sorry for the delay replying!
>
> On 31 Jul 2014, at 11:44, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>
>> On 26 July 2014 10:27, David Scott <scott.dj@xxxxxxxxx> wrote:
>>>
>>>
>>> On Saturday, July 26, 2014, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>>>>
>>>> On 25 Jul 2014, at 04:54, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>>>>
>>>>> Here's a slightly better work-around:
>>>>>
>>>>>
>>>>> https://github.com/talex5/mirage-tcpip/commit/9f7e0a628ebd1cece7d0fb979273a96be78233ad
>>>>>
>>>>> It changes the TCP packet splitting code so that the second part of
>>>>> the page is copied to a new IO page. This avoids sending the same
>>>>> physical page to Linux twice (which is what doesn't work).
>>>>>
>>>>> With this, the previous hack can be reverted.
>>>>>
>>>>> In my testing, this increased the streaming TCP download speed from 56
>>>>> KB/s to 3.9 MB/s on the CubieTruck.
>>>>
>>>> This is certainly a fix that works, but I'm loath to insert an unnecessary
>>>> copy in the TCP stack just because of a limitation in Xen/ARM (that will
>>>> go
>>>> away in Xen 4.5, I expect).
>>>>
>>>> Could we keep track of the active grants in the Gnttab libary, and force a
>>>> page duplication at that point instead?  That would fix it for any other
>>>> Cstruct consumers as well, and be a more logical place to insert Xen
>>>> version detection in the future to remove the extra copy if not needed.
>>>
>>>
>>> If the problem is granting the same page twice, would it be enough to
>>> maintain a map of page address to (grant index * refount), and then
>>> "grant_access" could return the existing grant id and bump the refcount? I
>>> think we just need to share the grant, rather than duplicate the page. (Is
>>> that right?)
>>
>> How would that fit in with the current API? It looks like you first
>> allocate a gntref (an int) and then use that to share the page:
>>
>>  val get : unit -> gntref Lwt.t
>>
>>  val grant_access : domid:int -> writable:bool -> gntref -> Io_page.t -> unit
>
> Yeah looking at it I think that interface is a bit too ârawâ. Unfortunately 
> âtype gntref = intâ in the interface so clients are free to do things like
> ```
>          ignore(RX.Proto_64.write ~id ~gref:(Int32.of_int gref) slot)
> ```
> I had been hoping to exploit the observation that (I think all) users call 
> âgrant_accessâ before serialising the grant reference into the ring metadata 
> slot. My evil/cunning plan had been to allocate a fresh grant in âget()â as 
> normal, but then swizzle it inside âgrant_accessâ if it turns out that a 
> grant has already been allocated in that page. However I would need to change 
> the implementation of âtype gntrefâ to make this work :-(
>
> Anyway, looking at the current users in mirage-{block,net}-xen, I can see 2 
> users of the low-level API:
>
>   1. granting access to the metadata ring on startup e.g.
> ```
> let allocate_ring ~domid =
>   let page = Io_page.get 1 in
>   let x = Io_page.to_cstruct page in
>   lwt gnt = Gnt.Gntshr.get () in
>   for i = 0 to Cstruct.len x - 1 do
>     Cstruct.set_uint8 x i 0
>   done;
>   Gnt.Gntshr.grant_access ~domid ~writable:true gnt page;
>   return (gnt, x)
> ```
>   2. granting access to pages for I/O e.g.
> ```
> let refill_requests nf =
>   let num = Ring.Rpc.Front.get_free_requests nf.rx_fring in
>   if num > 0 then
>     lwt grefs = Gnt.Gntshr.get_n num in
>     let pages = Io_page.pages num in
>     List.iter
>       (fun (gref, page) ->
>          let id = gref mod (1 lsl 16) in
>          Gnt.Gntshr.grant_access ~domid:nf.backend_id ~writable:true gref 
> page;
>          Hashtbl.add nf.rx_map id (gref, page);
>          let slot_id = Ring.Rpc.Front.next_req_id nf.rx_fring in
>          let slot = Ring.Rpc.Front.slot nf.rx_fring slot_id in
>          ignore(RX.Proto_64.write ~id ~gref:(Int32.of_int gref) slot)
>       ) (List.combine grefs pages);
>     if Ring.Rpc.Front.push_requests_and_check_notify nf.rx_fring
>     then notify nf ();
>     return ()
>   else return ()
> ```
>
> Weâve already got a âwith_grantsâ function but this doesnât work for these 
> cases because they both want to âleakâ the grants (they are cleaned up 
> elsewhere). Perhaps we should make an API (âreturn_grantsâ âalloc_grantsâ?) 
> which combines âget_nâ and âgrant_accessâ?
>
> Do you know if the bug is likely to be fixed in the backend in a reasonable 
> timeframe? Or is it worth changing the API soon so we could make a workaround 
> in the frontend?

There are patches against Xen 4.5 which they're currently back-porting
to Xen 4.4. I've updated xen-arm-builder to use the patched versions
of Xen and Linux and rebuilt the SD card images:

  https://github.com/mirage/xen-arm-builder

However, there are some race conditions in the new code that seem to
be triggered if Mirage boots and then exits quickly. I'm testing
various patches they're sending; see this thread:

  http://lists.xen.org/archives/html/xen-devel/2014-08/msg00053.html


-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.