[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Netback, Xen grants and a Linux panic

On 21 December 2015 at 18:39, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
> On 21 Dec 2015, at 17:51, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>> On 21 December 2015 at 10:42, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>> I'm trying to use a Mirage Xen unikernel to provide networking to
>>> other client VMs, using the experimental new netback support [1].
>>> It works fine when the client is also a Mirage unikernel, but Linux
>>> clients kernel panic:
>> OK, mystery solved! There was a bug in TX.Response.write that meant it
>> didn't set the ID field. Since the slot had just been used for the
>> Request, the gref field from the request ended up being sent as the id
>> field in the reply.
>> The Linux netfront driver doesn't check that the ID it got back
>> corresponds to one it actually sent, but just takes whatever's at that
>> index in its table (or whatever is at the location that would be that
>> index of the table if the table were that big). This is typically 0,
>> which it interprets as grant ref 0 rather than "invalid" and then
>> complains that ref 0 is still mapped.
>> The Mirage netfront driver always uses the gref as the ID, so it
>> worked anyway there.
>> Actually, using the gref as ID doesn't make sense, because two
>> requests can share the same gref. Lwt_ring didn't notice, because it
>> uses Hashtbl.add instead of Hashtbl.replace and so allows multiple
>> requests with the same ID. Presumably they get ack'd in the wrong
>> order, causing some pages to be returned to the free pool too soon.
>> Will fix...
> Fantastic detective work!  So this also means that the Linux netfront
> driver doesn't obey the grant protocol correctly with respect to out
> of order responses on the ring...

I think the Linux netfront is correct, though a bit too trusting of
the netback domain (see also:

> We should get a mirage-net-xen release out with these fixes asap though;
> is your current PR enough to improve the state of affairs, or do you
> want to wait until the gref-as-ID assumption is fixed?

The response-writer bug doesn't matter because it only affects the
netback code, which isn't released yet anyway.

We do need a new release for the frontend bug, though. I've pushed a
fix here: https://github.com/mirage/mirage-net-xen/pull/28

It's only lightly tested (I browsed the web a little with Firefox,
routing via my Mirage NAT/firewall). Would be good if someone has time
to review it.

Dr Thomas Leonard        http://roscidus.com/blog/
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.