[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Lockup/High ksoftirqd when rate-limiting is enabled


  • To: xen-users@xxxxxxxxxxxxx
  • From: Jean-Louis Dupond <jean-louis@xxxxxxxxx>
  • Date: Tue, 20 Jun 2017 09:11:28 +0200
  • Delivery-date: Tue, 20 Jun 2017 07:12:44 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

Hi,

We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64.
Now recently we're having issues with rate-limiting enabled.

When we enable rate limiting in Xen, and then do alot of outbound traffic on the domU, we notice a high ksoftirqd load.
But in some cases the system locks up completely.

This gives the following stacktrace:
Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison dm_persistent_data libcrc32c ext2 mbcache arptable_filter arp_tables xt_CT nf_conntrack iptable_raw iptable_filter ip_tables nbd(O) xen_gntalloc rdma_ucm(O) ib_ucm(O) rdma_cm(O) iw_cm(O) configfs ib_ipoib(O) ib_cm(O) ib_uverbs(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlx4_en(O) vxlan udp_tunnel ip6_udp_tunnel mlx4_ib(O) ib_sa(O) ib_mad(O) ib_core(O) ib_addr(O) ib_netlink(O) mlx4_core(O) mlx_compat(O) xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio dm_mirror_sync(O) dm_mirror dm_region_hash dm_log nfsv3 nfs_acl nfs fscache lockd sunrpc grace bridge ipv6 stp llc sg iTCO_wdt iTCO_vendor_support sd_mod mxm_wmi dcdbas pcspkr dm_mod ixgbe mdio sb_edac edac_core mgag200 Jun 4 11:07:56 xensrv1 kernel: ttm drm_kms_helper shpchp lpc_ich 8250_fintek ipmi_devintf ipmi_si ipmi_msghandler mei_me mei ahci libahci igb dca ptp pps_core megaraid_sas wmi acpi_power_meter hwmon xen_pciback cramfs Jun 4 11:07:56 xensrv1 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 3.18.44-20.el6.x86_64 #1 Jun 4 11:07:56 xensrv1 kernel: Hardware name: Dell Inc. PowerEdge R730xd/xxxx, BIOS 2.1.6 05/19/2016 Jun 4 11:07:56 xensrv1 kernel: task: ffff880275f6e010 ti: ffff880275fd0000 task.ti: ffff880275fd0000 Jun 4 11:07:56 xensrv1 kernel: RIP: e030:[<ffffffff8100bf38>] [<ffffffff8100bf38>] xen_restore_fl_direct+0x18/0x1b Jun 4 11:07:56 xensrv1 kernel: RSP: e02b:ffff88027aa23e30 EFLAGS: 00000297 Jun 4 11:07:56 xensrv1 kernel: RAX: 0000000000000008 RBX: 0000000000000200 RCX: 0000000000000003 Jun 4 11:07:56 xensrv1 kernel: RDX: ffff88027aa33f50 RSI: ffffc90013f88000 RDI: 0000000000000200 Jun 4 11:07:56 xensrv1 kernel: RBP: ffff88027aa23e48 R08: ffff88027aa33340 R09: ffff8802758d8a00 Jun 4 11:07:56 xensrv1 kernel: R10: ffff880283400c48 R11: 0000000000000000 R12: 0000000000000040 Jun 4 11:07:56 xensrv1 kernel: R13: ffffc90013f50000 R14: 0000000000000040 R15: 000000000000012b Jun 4 11:07:56 xensrv1 kernel: FS: 0000000000000000(0000) GS:ffff88027aa20000(0000) knlGS:ffff88027aa20000 Jun 4 11:07:56 xensrv1 kernel: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 Jun 4 11:07:56 xensrv1 kernel: CR2: 00007fad4acc6b08 CR3: 000000024e0a1000 CR4: 0000000000042660
Jun 4 11:07:56 xensrv1 kernel: Stack:
Jun 4 11:07:56 xensrv1 kernel: ffffffff815a1139 ffff88027aa23e58 ffffc90013f50028 ffff88027aa23e58 Jun 4 11:07:56 xensrv1 kernel: ffffffffa036fc81 ffff88027aa23e98 ffffffffa03733cd ffff88027aa23e98 Jun 4 11:07:56 xensrv1 kernel: ffffffff00000000 ffff880251e25050 ffffc90013f50028 0000000000000000
Jun 4 11:07:56 xensrv1 kernel: Call Trace:
Jun 4 11:07:56 xensrv1 kernel: <IRQ> [<ffffffff815a1139>] ? __napi_schedule+0x59/0x60 Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa036fc81>] xenvif_napi_schedule_or_enable_events+0x81/0x90 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa03733cd>] xenvif_poll+0x4d/0x68 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffff815a8b32>] net_rx_action+0x112/0x2c0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff81077d4c>] __do_softirq+0xfc/0x2f0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8107804d>] irq_exit+0xbd/0xd0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff813b668c>] xen_evtchn_do_upcall+0x3c/0x50 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8167c49e>] xen_do_hypervisor_callback+0x1e/0x40 Jun 4 11:07:56 xensrv1 kernel: <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8100b700>] ? xen_safe_halt+0x10/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101fd44>] ? default_idle+0x24/0xf0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101f34f>] ? arch_cpu_idle+0xf/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b37f6>] ? cpuidle_idle_call+0xd6/0x1d0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810926c2>] ? __atomic_notifier_call_chain+0x12/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3a25>] ? cpu_idle_loop+0x135/0x200 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b0b>] ? cpu_startup_entry+0x1b/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b50>] ? cpu_startup_entry+0x60/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101261a>] ? cpu_bringup_and_idle+0x2a/0x40 Jun 4 11:07:56 xensrv1 kernel: Code: 44 00 00 65 f6 04 25 c1 a0 00 00 ff 0f 94 c4 00 e4 c3 90 66 f7 c7 00 02 65 0f 94 04 25 c1 a0 00 00 65 66 83 3c 25 c0 a0 00 00 01 <75> 05 e8 01 00 00 00 c3 50 51 52 56 57 41 50 41 51 41 52 41 53

Sometimes we get this lockups for minutes, and then the system recovers.

But its clear we need to find a solution for this :)
And it seems like we're not the only ones: https://lists.centos.org/pipermail/centos-virt/2016-March/005014.html

There was also some other thread were there was a proposed patch (https://www.spinics.net/lists/netdev/msg282765.html). But I don't see any followup on this.

Any advice?

Thanks!
Jean-Louis Dupond

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.