[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Network stalls on domU under Xen-4.14.x
I'm recently having total network stalls on some domUs . Dmesg on domU shows a number of lines like: Feb 15 11:12:38 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:38 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:38 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:39 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:39 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:40 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:42 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:45 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:12:52 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 11:13:05 gt kernel: net eth0: rx->offset: 0, size: -1 On occasion, with the longer stalls (~ 5 minutes) I get: Feb 15 09:29:04 gt kernel: net_ratelimit: 5 callbacks suppressedI have tried this on xen 4.14.0, 4.14.1 and 4.14.2-pre, with various guest kernels ranging from linux-4.19.170 to the early 5.10.x kernels. Newer 5.10 kernels give me some other error, to do with interrupts. Seems interrupts vectors point to La-La-Land, or else they are routed to the wrong CPU. I'm fairly certain I did not have this issue running Xen-4.14-staging with the earliest linux-5.10.x, but that had other issues. File-system corruption got me a week around christmas with the whole system down :- ( . Allowed me to learn how to use bacula from a grml rescue cd without a catalog-database :-) . The stalls happen under load (net or cpu, don't know which matters more). I can reliably reproduce if i run a lot of compilations& network fetches in the domu while simultaneously lanunching firefox and thunderbird. I have home mounted with nfs from the dom0, so lots of traffic when thunderbird and firefox launch. On occation the stalls are caught by the kernel, and I get a stack-trace, but I guess those are consequences of the network stall, incidental to the real issue. like: Feb 15 09:09:38 gt kernel: status: r Feb 15 09:09:38 gt kernel: net_ratelimit: 5 callbacks suppressed Feb 15 09:09:38 gt kernel: net eth0: rx->offset: 0, size: -1Feb 15 09:09:38 gt root[45567]: ACPI event unhandled: jack/lineout LINEOUT unplug Feb 15 09:09:38 gt root[45570]: ACPI event unhandled: jack/videoout VIDEOOUT unplug Feb 15 09:09:44 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 09:09:57 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 09:10:01 gt CROND[45682]: (root) CMD (/usr/lib/sa/sa1 1 1) Feb 15 09:10:23 gt kernel: net eth0: rx->offset: 0, size: -1 Feb 15 09:11:17 gt kernel: net eth0: rx->offset: 0, size: -1Feb 15 09:11:58 gt kernel: INFO: task IndexedDB #3:45442 blocked for more than 122 seconds. Feb 15 09:11:58 gt kernel: Not tainted 5.4.80-gentoo-r1-x86_64 #1Feb 15 09:11:58 gt kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 15 09:11:58 gt kernel: IndexedDB #3 D 0 45442 3451 0x00000000 Feb 15 09:11:58 gt kernel: Call Trace: Feb 15 09:11:58 gt kernel: __schedule+0x2a3/0x7a0 Feb 15 09:11:58 gt kernel: ? nfs_pageio_complete+0xa8/0xf0 Feb 15 09:11:58 gt kernel: schedule+0x34/0xa0 Feb 15 09:11:58 gt kernel: io_schedule+0x3c/0x60 Feb 15 09:11:58 gt kernel: wait_on_page_bit_common+0x125/0x330Feb 15 09:11:58 gt kernel: ? trace_event_raw_event_file_check_and_advance_wb_err+0xf0/0xf0 Feb 15 09:11:58 gt kernel: __filemap_fdatawait_range+0x7b/0xe0 Feb 15 09:11:58 gt kernel: file_write_and_wait_range+0x67/0x90 Feb 15 09:11:58 gt kernel: nfs_file_fsync+0x83/0x190 Feb 15 09:11:58 gt kernel: __x64_sys_fsync+0x2f/0x60 Feb 15 09:11:58 gt kernel: do_syscall_64+0x51/0x130 Feb 15 09:11:58 gt kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 15 09:11:58 gt kernel: RIP: 0033:0x7f4db9580e1b Feb 15 09:11:58 gt kernel: Code: Bad RIP value.Feb 15 09:11:58 gt kernel: RSP: 002b:00007f4d9b4b4d50 EFLAGS: 00000293 ORIG_RAX: 000000000000004a Feb 15 09:11:58 gt kernel: RAX: ffffffffffffffda RBX: 00007f4d9f2abd28 RCX: 00007f4db9580e1b Feb 15 09:11:58 gt kernel: RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000072 Feb 15 09:11:58 gt kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 00007f4d9b4b4d70 Feb 15 09:11:58 gt kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000001f5 Feb 15 09:11:59 gt kernel: R13: 00007f4d9f2abc70 R14: 0000000000000000 R15: 00007f4da63774e0 --------- My xl info just now: xl info host : gentoo release : 5.4.97-gentoo-x86_64 version : #1 SMP Wed Feb 10 16:43:41 CET 2021 machine : x86_64 nr_cpus : 12 max_cpu_id : 11 nr_nodes : 2 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 2399.981hw_caps : bfebfbff:77fef3ff:2c100800:00000021:00000001:000037ab:00000000:00000100 virt_caps : pv hvm hvm_directio pv_directio hap shadow iommu_hap_pt_share total_memory : 130953 free_memory : 1551 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 14 xen_extra : .2-pre xen_version : 4.14.2-prexen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset :xen_commandline : xen.cfg xen-marker-51 console_timestamps=date iommu=1 com1=115200,8n1 console=com1 conswitch=lx cpufreq=xen:performance,verbose smt=0 maxcpus=12 core_parking=power nmi=dom0 gnttab_max_frames=512 gnttab_max_maptrack_frames=1024 vcpu_migration_delay=2000 tickle_one_idle_cpu=1 spec-ctrl=no-xen sched=credit2 timer_slop=5000 max_cstate=2 dom0_mem=16G,max:16G dom0_max_vcpus=8 ept=exec_sp=1 cc_compiler : gcc (Gentoo 9.3.0-r2 p4) 9.3.0 cc_compile_by : hakon cc_compile_domain : alstadheim.priv.no cc_compile_date : Sat Feb 13 22:07:40 CET 2021 build_id : d3fb26987b749da48c2549b12ba9ea4a xend_config_format : 4 0:root@gentoo xen-consoles #P.S: I know I should do something about my dmarc set-up, so I can have a separate, unprotected "from:" address for posting to mailing-lists. Pointers to how-to appreciated. --- Håkon
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |