[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] domU fails to boot with more than 32 vcpus on kernel 4.1+ - xen_netfront: can't alloc rx grant refs





On Wed, Jan 20, 2016 at 1:23 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
On Wed, 2016-01-20 at 12:53 +0100, hydra wrote:
> Booting up a xen domU with more than 32 vcpus fails with kernel 4.1.15
> (also tested with 4.4.0). This works without problems on kernel 3.14.

You can work around this by using theÂgnttab_max_frames _hypervisor_
command line parameter[0] (calledÂgnttab_max_nr_frames in Xen 4.4 and
earlier) to increase the number of grant table pages available to guests
(try doubling it from 32 (default) to 64.

Setting gnttab_max_frames=64 solved the issue, now kernel 4.1 boots fine with 40 vcups, thanks!
Â

Apart from reducing the number of vcpus you can also use the max_queues
options to either netfront (in the guest) or netback (in dom0) to limit the
number of queues, although that was broken until recently (fixed in Linux
4.3, I think) and I'm not sure where it has been backported to.

Yes, but that would be a pity to not use all cores :)
Â

I think the resulting crash which you report below has also been fixed in
4.3, did you see this with Linux 4.4.0 as well as 4.1.15 (hopefully not)?

I saw the same crash on 4.4.0. I even tried using 4.4.0 dom0 + 4.4.0 domU, same issue.
Â

Ian.

[0]Âhttp://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

>
> The domU configuration is:
>
> name = "node"
> kernel = "kernel-4.1.15-gentoo-xen"
> extra = "root=/dev/xvda1 net.ifnames=0"
> memory = 5000
> vcpus = 40
> vif = [ '' ]
> disk = [
> '/dev/vg_data/node_root,raw,xvda1,rw'
> ]
>
> # xl create -c node.cfg
>
> ...
> [ÂÂÂ 0.407136] io scheduler deadline registered
> [ÂÂÂ 0.407209] io scheduler cfq registered (default)
> [ÂÂÂ 0.407840] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ÂÂÂ 0.408174] Non-volatile memory driver v1.3
> [ÂÂÂ 0.408318] xen:xen_evtchn: Event-channel device installed
> [ÂÂÂ 0.408488] [drm] Initialized drm 1.1.0 20060810
> [ÂÂÂ 0.415157] loop: module loaded
> [ÂÂÂ 0.444874] blkfront: xvda1: barrier or flush: disabled; persistent
> grants: enabled; indirect descriptors: enabled;
> [ÂÂÂ 0.505838] tun: Universal TUN/TAP device driver, 1.6
> [ÂÂÂ 0.505862] tun: (C) 1999-2004 Max Krasnyansky <maxk@xxxxxxxxxxxx>
> [ÂÂÂ 0.506039] xen_netfront: Initialising Xen virtual ethernet driver
> [ÂÂÂ 0.521246] blkfront: xvda3: barrier or flush: disabled; persistent
> grants: enabled; indirect descriptors: enabled;
> [ÂÂÂ 0.533589] xen_netfront: can't alloc rx grant refs
> [ÂÂÂ 0.533612] net eth0: only created 31 queues
> [ÂÂÂ 0.534907] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000018
> [ÂÂÂ 0.534923] IP: [<ffffffff814e4553>] netback_changed+0x7b3/0xce0
> [ÂÂÂ 0.534941] PGD 0
> [ÂÂÂ 0.534948] Oops: 0000 [#1] SMP
> [ÂÂÂ 0.534960] CPU: 4 PID: 176 Comm: xenwatch Not tainted 4.1.15-gentoo
> #2
> [ÂÂÂ 0.534969] task: ffff88013001ad00 ti: ffff8801301a0000 task.ti:
> ffff8801301a0000
> [ÂÂÂ 0.534980] RIP: e030:[<ffffffff814e4553>]Â [<ffffffff814e4553>]
> netback_changed+0x7b3/0xce0
> [ÂÂÂ 0.534993] RSP: e02b:ffff8801301a3d88Â EFLAGS: 00010202
> [ÂÂÂ 0.535003] RAX: 0000000000000000 RBX: ffff8800fe8e8000 RCX:
> 0000000000000001
> [ÂÂÂ 0.535005] RDX: 0000000000000001 RSI: ffff8800fe9440f8 RDI:
> 0000000000003f41
> [ÂÂÂ 0.535005] RBP: ffff8801301a3e18 R08: ffffc90001680000 R09:
> ffffffff81932aa7
> [ÂÂÂ 0.535005] R10: ffffea0003fa3a80 R11: ffffea0004bf6000 R12:
> 0000000000044000
> [ÂÂÂ 0.535005] R13: ffff8800fe9440f8 R14: ffff8800fe8e9000 R15:
> ffff8800fe944000
> [ÂÂÂ 0.535005] FS:Â 0000000000000000(0000) GS:ffff880131e80000(0000)
> knlGS:0000000000000000
> [ÂÂÂ 0.535005] CS:Â e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ÂÂÂ 0.535005] CR2: 0000000000000018 CR3: 0000000001a0c000 CR4:
> 0000000000042660
> [ÂÂÂ 0.535005] Stack:
> [ÂÂÂ 0.535005]Â ffff8801301a3e08 ffff8800fe941e04 ffff8801301bac00
> ffff880000000001
> [ÂÂÂ 0.535005]Â ffff88013033f000 ffff8801301bac00 00000028301a3e14
> ffff880100000020
> [ÂÂÂ 0.535005]Â 0000000131e96c00 00000001307b8d60 0000002800000001
> ffff880100003f41
> [ÂÂÂ 0.535005] Call Trace:
> [ÂÂÂ 0.535005]Â [<ffffffff81458db8>] xenbus_otherend_changed+0x98/0xa0
> [ÂÂÂ 0.535005]Â [<ffffffff81457690>] ? find_watch+0x50/0x50
> [ÂÂÂ 0.535005]Â [<ffffffff8145a3ae>] backend_changed+0xe/0x10
> [ÂÂÂ 0.535005]Â [<ffffffff8145772f>] xenwatch_thread+0x9f/0x140
> [ÂÂÂ 0.535005]Â [<ffffffff81087b70>] ? __wake_up_common+0x90/0x90
> [ÂÂÂ 0.535005]Â [<ffffffff8106b704>] kthread+0xc4/0xe0
> [ÂÂÂ 0.535005]Â [<ffffffff81010000>] ? __xen_send_IPI_mask+0x30/0x50
> [ÂÂÂ 0.535005]Â [<ffffffff8106b640>] ? __kthread_parkme+0x80/0x80
> [ÂÂÂ 0.535005]Â [<ffffffff81649f22>] ret_from_fork+0x42/0x70
> [ÂÂÂ 0.535005]Â [<ffffffff8106b640>] ? __kthread_parkme+0x80/0x80
> [ÂÂÂ 0.535005] Code: 89 f7 e8 d1 d9 c1 ff 41 8b bd 58 01 00 00 31 f6 e8
> b3 bb f6 ff 31 f6 4c 89 ff e8 b9 d9 c1 ff e9 74 ff ff ff 49 8b 47 20 4c
> 89 ee <48> 8b 78 18 e8 94 1e f7 ff 85 c0 0f 88 d4 fd ff ff 49 8b 47 20
> [ÂÂÂ 0.535005] RIPÂ [<ffffffff814e4553>] netback_changed+0x7b3/0xce0
> [ÂÂÂ 0.535005]Â RSP <ffff8801301a3d88>
> [ÂÂÂ 0.535005] CR2: 0000000000000018
> [ÂÂÂ 0.535005] ---[ end trace e66fd859d9634ab5 ]---
> [ÂÂÂ 0.573507] xenwatch (176) used greatest stack depth: 13360 bytes left
> [ÂÂÂ 1.387133] clocksource tsc: mask: 0xffffffffffffffff max_cycles:
> 0x2ca24e83e81, max_idle_ns: 440795284478 ns
> [ÂÂÂ 1.538503] i8042: No controller found
> [ÂÂÂ 1.558800] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> [ÂÂÂ 1.558930] rtc_cmos: probe of rtc_cmos failed with error -38
> [ÂÂÂ 1.559236] device-mapper: ioctl: 4.31.0-ioctl (2015-3-12)
> initialised: dm-devel@xxxxxxxxxx
> [ÂÂÂ 1.559300] hidraw: raw HID events driver (C) Jiri Kosina
> [ÂÂÂ 1.559687] Netfilter messages via NETLINK v0.30.
> [ÂÂÂ 1.559705] nfnl_acct: registering with nfnetlink.
> [ÂÂÂ 1.559738] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
> [ÂÂÂ 1.560111] ctnetlink v0.93: registering with nfnetlink.
> [ÂÂÂ 1.560469] xt_time: kernel timezone is -0000
> [ÂÂÂ 1.560495] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
> [ÂÂÂ 1.560524] IPVS: Connection hash table configured (size=4096,
> memory=64Kbytes)
> [ÂÂÂ 1.560585] IPVS: Creating netns size=2144 id=0
> [ÂÂÂ 1.560612] IPVS: ipvs loaded.
> [ÂÂÂ 1.560627] IPVS: [rr] scheduler registered.
> [ÂÂÂ 1.560638] IPVS: [wrr] scheduler registered.
> [ÂÂÂ 1.560651] IPVS: [lc] scheduler registered.
> [ÂÂÂ 1.560662] IPVS: [wlc] scheduler registered.
> [ÂÂÂ 1.560673] IPVS: [fo] scheduler registered.
> [ÂÂÂ 1.560690] IPVS: [lblc] scheduler registered.
> [ÂÂÂ 1.560705] IPVS: [lblcr] scheduler registered.
> [ÂÂÂ 1.560717] IPVS: [dh] scheduler registered.
> [ÂÂÂ 1.560728] IPVS: [sh] scheduler registered.
> [ÂÂÂ 1.560740] IPVS: [sed] scheduler registered.
> [ÂÂÂ 1.560751] IPVS: [nq] scheduler registered.
> [ÂÂÂ 1.560765] IPVS: [sip] pe registered.
> [ÂÂÂ 1.560909] ip_tables: (C) 2000-2006 Netfilter Core Team
> [ÂÂÂ 1.561060] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
> [ÂÂÂ 1.561085] arp_tables: (C) 2002 David S. Miller
> [ÂÂÂ 1.561132] Initializing XFRM netlink socket
> [ÂÂÂ 1.561151] NET: Registered protocol family 17
> [ÂÂÂ 1.561484] registered taskstats version 1
> [ÂÂÂ 1.562651] Btrfs loaded
> [ÂÂÂ 1.563121] Key type encrypted registered
> [ÂÂÂ 6.663125] xenbus_probe_frontend: Waiting for devices to initialise:
> 25s...20s...15s...
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxx
> http://lists.xen.org/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.