Xen project Mailing List

Re: [BUG] Core scheduling patches causing deadlock in some situations

To: Jürgen Groß <jgross@xxxxxxxx>

From: Michał Leszczyński <michal.leszczynski@xxxxxxx>

Date: Fri, 29 May 2020 14:51:45 +0200 (CEST)

Cc: chivay@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, bonus@xxxxxxx, Tamas K Lengyel <tamas.k.lengyel@xxxxxxxxx>

Delivery-date: Fri, 29 May 2020 12:52:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: Ip8pjHOe2SXZ63rsxdtycq26oWTKvw==

Thread-topic: Core scheduling patches causing deadlock in some situations

----- 29 maj 2020 o 14:44, Jürgen Groß jgross@xxxxxxxx napisał(a): > On 29.05.20 14:30, Michał Leszczyński wrote: >> Hello, >> >> I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R) >> Xeon(R) Gold 6132 CPU @ 2.60GHz CPU. >> When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some stability >> problems concerning freezes of Dom0 (Debian Buster): >> >> --- >> >> maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on >> CPU >> maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP) >> idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515 >> maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799) >> maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0 >> maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P OE >> 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 >> maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge >> R640/08HT8T, >> BIOS 2.1.8 04/30/2019 >> maj 27 23:17:02 debian kernel: Call Trace: >> maj 27 23:17:02 debian kernel: <IRQ> >> maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80 >> maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50 >> maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b >> maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb >> maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb >> maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335 >> maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60 >> maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60 >> maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60 >> >> --- >> >> This usually results in machine being completely unresponsive and performing >> an >> automated reboot after some time. >> >> I've bisected commits between 4.12 and 4.13 and it seems like this is the >> patch >> which introduced a bug: >> https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21 >> >> Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the >> fresh >> boot of the machine on which the bug was reproduced. >> >> I'm also attaching the `xl info` output from this machine: >> >> --- >> >> release : 4.19.0-6-amd64 >> version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) >> machine : x86_64 >> nr_cpus : 14 >> max_cpu_id : 223 >> nr_nodes : 1 >> cores_per_socket : 14 >> threads_per_core : 1 >> cpu_mhz : 2593.930 >> hw_caps : >> bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100 >> virt_caps : pv hvm hvm_directio pv_directio hap shadow >> total_memory : 130541 >> free_memory : 63591 >> sharing_freed_memory : 0 >> sharing_used_memory : 0 >> outstanding_claims : 0 >> free_cpus : 0 >> xen_major : 4 >> xen_minor : 13 >> xen_extra : -unstable >> xen_version : 4.13-unstable >> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p >> hvm-3.0-x86_64 >> xen_scheduler : credit2 >> xen_pagesize : 4096 >> platform_params : virt_start=0xffff800000000000 >> xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty > > Which is your original Xen base? This output is clearly obtained at the > end of the bisect process. > > There have been quite some corrections since the release of Xen 4.13, so > please make sure you are running the most actual version (4.13.1). > > > Juergen Sure, we have tested both RELEASE 4.13 and RELEASE 4.13.1. Unfortunately these corrections didn't help and the bug is still reproducible. >From our testing it turns out that: Known working revision: 997d6248a9ae932d0dbaac8d8755c2b15fec25dc Broken revision: 6278553325a9f76d37811923221b21db3882e017 First bad commit: 7c7b407e77724f37c4b448930777a59a479feb21 Best regards, Michał Leszczyński CERT Polska

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.