[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [BUG] On bdw-ep, failed to offline/online socket1 cpu on Dom0



Bug detailed description:
1:When offline all the socket1 cpus , network segment hang .
2:When online all the socket1 cpus , it show "(XEN) Panic on CPU 44:".
3: Haswell-ep and Haswell-ex cpu offline/ online successfully.

Environment :
HW: Boardwell-ep
Xen: Xen 4.7.0 RC3
Dom0: Linux 4.6.0

Reproduce steps:
----------------
1: run cpu_on_offline.sh to offline all the socket1 cpus (cpu44--cpu88) 
cat cpu_on_offline.sh 
#!/bin/bash 
set +x per_socket_cpu_num=`xenpm get-cpu-topology | awk '{if($3~/0/) print$1}' 
| wc -l`
echo "per_socket_cpu_num is $per_socket_cpu_num"
for ((i=44;i<88;i++));do
./cpu_offline $i
#./cpu_online $i 
if [ $? -ne 0 ];then
echo "CHECK_CAT_CHR:: offline/online cpus in socket1 failed"
fi
done
Note :cpu_online and cpu_offline are compile by cpu_online.c and cpu_offline.c 
, these two file attached .
2: Modify cpu_on_offline.sh, to online  all the socket1 cpus (cpu44--cpu88)

Current result:
----------------
For step1: offline cpus successfully, but network segment hang 
For step2: BDW-EP will panic and reboot

Basic root-causing log:
----------------------
The console log about offline cpus:
(XEN) Broke affinity for irq 110
(XEN) CMCI: threshold 0x2 too large for CPU54 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU54 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU54 bank 19, using 0x1
(XEN) Broke affinity for irq 136
(XEN) Broke affinity for irq 156
(XEN) Broke affinity for irq 83
(XEN) Broke affinity for irq 112
(XEN) Broke affinity for irq 117
(XEN) Broke affinity for irq 135
(XEN) Broke affinity for irq 131
(XEN) Broke affinity for irq 92
(XEN) Broke affinity for irq 156
(XEN) Broke affinity for irq 88
(XEN) CMCI: threshold 0x2 too large for CPU55 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU55 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU55 bank 19, using 0x1
(XEN) Broke affinity for irq 101
(XEN) CMCI: threshold 0x2 too large for CPU60 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU60 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU60 bank 19, using 0x1
(XEN) Broke affinity for irq 119
(XEN) Broke affinity for irq 83
(XEN) Broke affinity for irq 101
(XEN) Broke affinity for irq 104
(XEN) Broke affinity for irq 123
(XEN) Broke affinity for irq 144
(XEN) Broke affinity for irq 155
(XEN) Broke affinity for irq 93
(XEN) Broke affinity for irq 122
(XEN) Broke affinity for irq 132
(XEN) Broke affinity for irq 147
(XEN) Broke affinity for irq 152
(XEN) CMCI: threshold 0x2 too large for CPU63 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU63 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU63 bank 19, using 0x1
(XEN) Broke affinity for irq 101
(XEN) Broke affinity for irq 106
(XEN) Broke affinity for irq 119
(XEN) Broke affinity for irq 123
(XEN) Broke affinity for irq 131
(XEN) Broke affinity for irq 132
(XEN) Broke affinity for irq 135
(XEN) Broke affinity for irq 136
(XEN) CMCI: threshold 0x2 too large for CPU65 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU65 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU65 bank 19, using 0x1
(XEN) Broke affinity for irq 109
(XEN) Broke affinity for irq 152
(XEN) Broke affinity for irq 155
(XEN) Broke affinity for irq 93
(XEN) Broke affinity for irq 122
(XEN) CMCI: threshold 0x2 too large for CPU68 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU68 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU68 bank 19, using 0x1
(XEN) Broke affinity for irq 84
(XEN) CMCI: threshold 0x2 too large for CPU69 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU69 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU69 bank 19, using 0x1
(XEN) Broke affinity for irq 141
(XEN) CMCI: threshold 0x2 too large for CPU70 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU70 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU70 bank 19, using 0x1
(XEN) Broke affinity for irq 134
(XEN) Broke affinity for irq 152
(XEN) Broke affinity for irq 155
(XEN) Broke affinity for irq 161
(XEN) CMCI: threshold 0x2 too large for CPU71 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU71 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU71 bank 19, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU73 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU73 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU73 bank 19, using 0x1
(XEN) Broke affinity for irq 93
(XEN) Broke affinity for irq 156
(XEN) Broke affinity for irq 149
(XEN) CMCI: threshold 0x2 too large for CPU74 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU74 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU74 bank 19, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU76 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU76 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU76 bank 19, using 0x1
(XEN) Broke affinity for irq 99
(XEN) Broke affinity for irq 151
(XEN) Broke affinity for irq 154
(XEN) Broke affinity for irq 157
(XEN) Broke affinity for irq 160
(XEN) Broke affinity for irq 163
(XEN) Broke affinity for irq 213
(XEN) CMCI: threshold 0x2 too large for CPU77 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU77 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU77 bank 19, using 0x1
(XEN) Broke affinity for irq 98
(XEN) CMCI: threshold 0x2 too large for CPU83 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU83 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU83 bank 19, using 0x1
(XEN) Broke affinity for irq 103
(XEN) Broke affinity for irq 88
(XEN) Broke affinity for irq 108
(XEN) Broke affinity for irq 148
(XEN) Broke affinity for irq 158
(XEN) Broke affinity for irq 161
(XEN) CMCI: threshold 0x2 too large for CPU85 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU85 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU85 bank 19, using 0x1
(XEN) Broke affinity for irq 85
(XEN) Broke affinity for irq 114
(XEN) Broke affinity for irq 148
(XEN) CMCI: threshold 0x2 too large for CPU87 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU87 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU87 bank 19, using 0x1 [ 1658.802833] 
ixgbe 0000:03:00.1 eth1: 1 Spoofed packets detected [ 1660.806998] ixgbe 
0000:03:00.1 eth1: 1 Spoofed packets detected [ 1666.819135] ixgbe 0000:03:00.1 
eth1: 1 Spoofed packets detected [ 1682.851840] ixgbe 0000:03:00.1 eth1: 1 
Spoofed packets detected [ 1702.892595] ixgbe 0000:03:00.1 eth1: 1 Spoofed 
packets detected [ 1706.900696] ixgbe 0000:03:00.1 eth1: 1 Spoofed packets 
detected [ 1712.913001] ixgbe 0000:03:00.1 eth1: 1 Spoofed packets detected [ 
1720.929314] ixgbe 0000:03:00.1 eth1: 1 Spoofed packets detected [ 1728.945621] 
ixgbe 0000:03:00.1 eth1: 1 Spoofed packets detected [ 1742.974173] ixgbe 
0000:03:00.1 eth1: 1 Spoofed packets detected [ 1757.002703] ixgbe 0000:03:00.1 
eth1: 1 Spoofed packets detected [ 1763.014936] ixgbe 0000:03:00.1 eth1: 1 
Spoofed packets detected [ 1769.027099] ixgbe 0000:03:00.1 eth1: 1 Spoofed 
packets detected [ 1779.047473] ixgbe 0000:03:00.1 eth1: 1 Spoofed packets 
detected [ 1789.067856] ixgbe 0000:03:00.1 eth1: 1 Spoofed packets detected The 
console log about online cpus:
[root@vt-sa3 ~]# minicom bdw-ep1
Welcome to minicom 2.6.2
OPTIONS: I18n
Compiled on Dec 28 2013, 13:58:29.
Port /dev/ttyUSB15
Press CTRL-A Z for help on special keys
(XEN) CMCI: threshold 0x2 too large for CPU44 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU44 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU44 bank 19, using 0x1
(XEN) ---[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---
(XEN) CPU: 44
(XEN) RIP: e008:[<ffff82d08018e5c5>] psr.c#psr_cpu_init+0x11c/0x2e5
(XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
(XEN) rax: ffff83043c436fe0 rbx: 0000000000000001 rcx: 0000000000000014
(XEN) rdx: 00000000000fffff rsi: 0000000000000001 rdi: ffff83043c4358a0
(XEN) rbp: ffff83087b00fe40 rsp: ffff83087b00fe30 r8: 0000000000000004
(XEN) r9: 0000000000000000 r10: 0000ffff0000ffff r11: 00ff00ff00ff00ff
(XEN) r12: ffff82d0802a7cc0 r13: 0000000000000000 r14: 0000000000000003
(XEN) r15: 000000000000002c cr0: 000000008005003b cr4: 00000000003526e0
(XEN) cr3: 0000000078aa5000 cr2: ffff82d0802ea94c
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff83087b00fe30:
(XEN) 0000000000000040 ffff82d0802a7cc8 ffff83087b00fe60 ffff82d08018e7da
(XEN) ffff82d0802a7cc8 ffff82d0802a7cc0 ffff83087b00feb0 ffff82d08011c18b
(XEN) 0000000000000000 ffff82d0802a74a8 ffff82d0803423c8 0000000000000000
(XEN) 0000000000000000 0000000000000009 000000000000002c 000000000000002c
(XEN) ffff83087b00fec0 ffff82d0801012ec ffff83087b00ff10 ffff82d080190f63
(XEN) c2c2c2c2c2c2c2c2 0000002cc2c2c2c2 c2c2c2c2c2c2c2c2 0000000000000001
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
(XEN) c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c20000002c ffff83007b7f7000
(XEN) 00000033bc8cdb00 c2c2c2c2c2c2c2c2
(XEN) Xen call trace:
(XEN) [<ffff82d08018e5c5>] psr.c#psr_cpu_init+0x11c/0x2e5
(XEN) [<ffff82d08018e7da>] psr.c#cpu_callback+0x4c/0x10c
(XEN) [<ffff82d08011c18b>] notifier_call_chain+0x6b/0x90
(XEN) [<ffff82d0801012ec>] notify_cpu_starting+0x1c/0x26
(XEN) [<ffff82d080190f63>] start_secondary+0x22c/0x27a
(XEN)
(XEN) Pagetable walk from ffff82d0802ea94c:
(XEN) L4[0x105] = 0000000078aa4063 ffffffffffffffff
(XEN) L3[0x142] = 0000000078aa1063 ffffffffffffffff
(XEN) L2[0x001] = 000000087b0f5063 ffffffffffffffff
(XEN) L1[0x0ea] = 8000000078aea262 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 44:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff82d0802ea94c
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


Regards,
Pengtao

Attachment: 0001-x86-psr-make-opt_psr-persistent.patch
Description: 0001-x86-psr-make-opt_psr-persistent.patch

Attachment: dmesg_bf925a9f.log
Description: dmesg_bf925a9f.log

Attachment: xl_dmesg_bf925a9f.log
Description: xl_dmesg_bf925a9f.log

Attachment: cpu_offline.c
Description: cpu_offline.c

Attachment: cpu_online.c
Description: cpu_online.c

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.