[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MirageOS-devel] Netback, Xen grants and a Linux panic



I'm trying to use a Mirage Xen unikernel to provide networking to
other client VMs, using the experimental new netback support [1].

It works fine when the client is also a Mirage unikernel, but Linux
clients kernel panic:

...
Started CUPS Printing Service.
Started Qubes misc post-boot actions.
[    6.063915] xen_netfront: xennet_tx_buf_gc: warning -- grant still
in use by backend domain
[    6.063971] ------------[ cut here ]------------
[    6.063983] kernel BUG at
/home/user/rpmbuild/BUILD/kernel-3.18.17/linux-3.18.17/drivers/net/xen-netfront.c:421!
[    6.063998] invalid opcode: 0000 [#1] SMP
[    6.064016] Modules linked in: ip6table_filter ip6_tables
intel_rapl x86_pkg_temp_thermal xt_conntrack coretemp crct10dif_pclmul
snd_pcm xen_netfront crc32_pclmul snd_timer crc32c_intel
ipt_MASQUERADE snd soundcore nf_nat_masquerade_ipv4 pcspkr
ghash_clmulni_intel iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack xenfs xen_privcmd dummy_hcd udc_core
u2mfn(O) xen_blkback fuse parport_pc ppdev lp parport xen_blkfront
[    6.064137] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
3.18.17-7.pvops.qubes.x86_64 #1
[    6.064152] task: ffffffff81c1b4a0 ti: ffffffff81c00000 task.ti:
ffffffff81c00000
[    6.064165] RIP: e030:[<ffffffffa00c43e7>]  [<ffffffffa00c43e7>]
xennet_tx_buf_gc+0x1a7/0x1d0 [xen_netfront]
[    6.064190] RSP: e02b:ffff880018203d78  EFLAGS: 00010046
[    6.064199] RAX: 000000000000004f RBX: ffff880011ab01f4 RCX: 00000000fffffffa
[    6.064210] RDX: 0000000000000000 RSI: ffff880018203bdc RDI: 0000000000000004
[    6.064220] RBP: ffff880018203dc8 R08: 0000000000000000 R09: ffff880013800000
[    6.064230] R10: 0000000000000000 R11: ffff880018203a4e R12: ffff880011ab03e8
[    6.064241] R13: 000000000000007d R14: 0000000000000000 R15: ffff880011ab0000
[    6.064261] FS:  0000000000000000(0000) GS:ffff880018200000(0000)
knlGS:ffff880018300000
[    6.064273] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.064282] CR2: 00007f7941c16330 CR3: 0000000010269000 CR4: 0000000000042660
[    6.064295] Stack:
[    6.064301]  ffffffff81729d6f ffff880011ab0900 000000000000007e
00000007007d16a7
[    6.064318]  0000000000000000 ffff880011ab00da ffff880011ab0000
000000000000001f
[    6.064335]  0000000000000000 0000000000000000 ffff880018203df8
ffffffffa00c443f
[    6.064352] Call Trace:
[    6.064358]  <IRQ>
[    6.064363]  [<ffffffff81729d6f>] ? _raw_spin_unlock_irqrestore+0x1f/0x40
[    6.064395]  [<ffffffffa00c443f>] xennet_tx_interrupt+0x2f/0x50
[xen_netfront]
[    6.064411]  [<ffffffffa00c4c76>] xennet_interrupt+0x16/0x30 [xen_netfront]
[    6.064426]  [<ffffffff810e880e>] handle_irq_event_percpu+0x3e/0x1a0
[    6.064439]  [<ffffffff810e89b1>] handle_irq_event+0x41/0x70
[    6.064454]  [<ffffffff810eba0f>] handle_edge_irq+0x7f/0x120
[    6.064467]  [<ffffffff810e7e8b>] generic_handle_irq+0x2b/0x40
[    6.064484]  [<ffffffff8144d49a>] evtchn_fifo_handle_events+0x17a/0x190
[    6.064497]  [<ffffffff8144a0e0>] __xen_evtchn_do_upcall+0x50/0x90
[    6.064513]  [<ffffffff8144c027>] xen_evtchn_do_upcall+0x37/0x50
[    6.064528]  [<ffffffff8172c19e>] xen_do_hypervisor_callback+0x1e/0x30

The code that generates the warning is:

            struct xen_netif_tx_response *txrsp;

            txrsp = RING_GET_RESPONSE(&queue->tx, cons);
            if (txrsp->status == XEN_NETIF_RSP_NULL)
                continue;

            id  = txrsp->id;
            skb = queue->tx_skbs[id].skb;
            if (unlikely(gnttab_query_foreign_access(
                queue->grant_tx_ref[id]) != 0)) {
                pr_alert("%s: warning -- grant still in use by backend
domain\n",
                     __func__);
                BUG();
            }

i.e. this didn't return zero:

static int gnttab_query_foreign_access_v1(grant_ref_t ref)
{
    return gnttab_shared.v1[ref].flags & (GTF_reading|GTF_writing);
}

The strange thing is, this happens even if I never map the grant at
all! I simplified the netback code to just ack each frame without
mapping it:

  let listen (t: t) _fn : unit Lwt.t =
    let rec loop after =
      Ring.Rpc.Back.ack_requests t.from_netfront
        (fun slot ->
          match TX.Request.read slot with
          | Error msg -> failwith "Netif.Backend.read_read TX has
unparseable request: %s\n%!" msg
          | Ok {TX.Request.id; gref; offset; flags; size} ->
            Printf.printf "Got request with ID %d (ref = %ld, offset =
%d, flags = %x, size = %d)\n%!"
              id gref offset (Flags.to_int flags) size;
            let resp_slot = Ring.Rpc.(Back.slot t.from_netfront
(Back.next_res_id t.from_netfront)) in
            let resp = { TX.Response.id; status = TX.Response.OKAY } in
            Printf.printf "Writing response with ID %d\n%!" resp.TX.Response.id;
            TX.Response.write resp resp_slot;
        );
      let notify = Ring.Rpc.Back.push_responses_and_check_notify
t.from_netfront in
      if notify then Eventchn.notify h t.channel;
      OS.Activations.after t.channel after
      >>= loop in
    loop OS.Activations.program_start

It outputs just:

Got request with ID 0 (ref = 132, offset = 2050, flags = 0, size = 90)
Writing response with ID 0

Any idea what would cause this? Why does Linux think the grant is
still in use even when I never map it in the first place? Do I need to
do something extra to dispose of a grant?

Thanks,

(also, I guess Mirage netfront should check for grants being released too...)


[1] https://github.com/mirage/mirage-net-xen/pull/26

-- 
Dr Thomas Leonard        http://roscidus.com/blog/
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.