[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Core scheduling patches causing deadlock in some situations



----- 29 maj 2020 o 14:44, Jürgen Groß jgross@xxxxxxxx napisał(a):

> On 29.05.20 14:30, Michał Leszczyński wrote:
>> Hello,
>> 
>> I'm running DRAKVUF on Dell Inc. PowerEdge R640/08HT8T server with Intel(R)
>> Xeon(R) Gold 6132 CPU @ 2.60GHz CPU.
>> When upgrading from Xen RELEASE 4.12 to 4.13, we have noticed some stability
>> problems concerning freezes of Dom0 (Debian Buster):
>> 
>> ---
>> 
>> maj 27 23:17:02 debian kernel: rcu: INFO: rcu_sched self-detected stall on 
>> CPU
>> maj 27 23:17:02 debian kernel: rcu: 0-....: (5250 ticks this GP)
>> idle=cee/1/0x4000000000000002 softirq=11964/11964 fqs=2515
>> maj 27 23:17:02 debian kernel: rcu: (t=5251 jiffies g=27237 q=799)
>> maj 27 23:17:02 debian kernel: NMI backtrace for cpu 0
>> maj 27 23:17:02 debian kernel: CPU: 0 PID: 643 Comm: z_rd_int_1 Tainted: P OE
>> 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
>> maj 27 23:17:02 debian kernel: Hardware name: Dell Inc. PowerEdge 
>> R640/08HT8T,
>> BIOS 2.1.8 04/30/2019
>> maj 27 23:17:02 debian kernel: Call Trace:
>> maj 27 23:17:02 debian kernel: <IRQ>
>> maj 27 23:17:02 debian kernel: dump_stack+0x5c/0x80
>> maj 27 23:17:02 debian kernel: nmi_cpu_backtrace.cold.4+0x13/0x50
>> maj 27 23:17:02 debian kernel: ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b
>> maj 27 23:17:02 debian kernel: nmi_trigger_cpumask_backtrace+0xf9/0xfb
>> maj 27 23:17:02 debian kernel: rcu_dump_cpu_stacks+0x9b/0xcb
>> maj 27 23:17:02 debian kernel: rcu_check_callbacks.cold.81+0x1db/0x335
>> maj 27 23:17:02 debian kernel: ? tick_sched_do_timer+0x60/0x60
>> maj 27 23:17:02 debian kernel: update_process_times+0x28/0x60
>> maj 27 23:17:02 debian kernel: tick_sched_handle+0x22/0x60
>> 
>> ---
>> 
>> This usually results in machine being completely unresponsive and performing 
>> an
>> automated reboot after some time.
>> 
>> I've bisected commits between 4.12 and 4.13 and it seems like this is the 
>> patch
>> which introduced a bug:
>> https://github.com/xen-project/xen/commit/7c7b407e77724f37c4b448930777a59a479feb21
>> 
>> Enclosed you can find the `xl dmesg` log (attachment: dmesg.txt) from the 
>> fresh
>> boot of the machine on which the bug was reproduced.
>> 
>> I'm also attaching the `xl info` output from this machine:
>> 
>> ---
>> 
>> release : 4.19.0-6-amd64
>> version : #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
>> machine : x86_64
>> nr_cpus : 14
>> max_cpu_id : 223
>> nr_nodes : 1
>> cores_per_socket : 14
>> threads_per_core : 1
>> cpu_mhz : 2593.930
>> hw_caps :
>> bfebfbff:77fef3ff:2c100800:00000121:0000000f:d19ffffb:00000008:00000100
>> virt_caps : pv hvm hvm_directio pv_directio hap shadow
>> total_memory : 130541
>> free_memory : 63591
>> sharing_freed_memory : 0
>> sharing_used_memory : 0
>> outstanding_claims : 0
>> free_cpus : 0
>> xen_major : 4
>> xen_minor : 13
>> xen_extra : -unstable
>> xen_version : 4.13-unstable
>> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p
>> hvm-3.0-x86_64
>> xen_scheduler : credit2
>> xen_pagesize : 4096
>> platform_params : virt_start=0xffff800000000000
>> xen_changeset : Wed Oct 2 09:27:27 2019 +0200 git:7c7b407e77-dirty
> 
> Which is your original Xen base? This output is clearly obtained at the
> end of the bisect process.
> 
> There have been quite some corrections since the release of Xen 4.13, so
> please make sure you are running the most actual version (4.13.1).
> 
> 
> Juergen

Sure, we have tested both RELEASE 4.13 and RELEASE 4.13.1. Unfortunately these 
corrections didn't help and the bug is still reproducible.

>From our testing it turns out that:

Known working revision: 997d6248a9ae932d0dbaac8d8755c2b15fec25dc
Broken revision: 6278553325a9f76d37811923221b21db3882e017
First bad commit: 7c7b407e77724f37c4b448930777a59a479feb21


Best regards,
Michał Leszczyński
CERT Polska



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.