[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

On 01/15/2016 12:05 PM, Andrew Cooper wrote:
On 15/01/16 10:58, Håkon Alstadheim wrote:
This is just a preliminary report, mostly just for the record.

I will report again if this keeps happening after 4.7 is out, or upon
request. Anyone working on this, please mail me and request more
information. I have available logs from dom0 boot (I dump dmesg and xl
dmesg to disk after every boot, and log dom0 serial console to disk).
I will send boot logs if requested. I will turn on maximum verbosity
and provide all output. My serial console is very slow, so I can not
keep running at max verbosity all the time.

At the end of this mail there is "xl info" and output from dom0 serial

vendor_id    : GenuineIntel
cpu family    : 6
model        : 63
model name    : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

# smbios-sys-info
Libsmbios version:      2.2.28
Product Name:           Z10PE-D8 WS
Vendor:                 ASUSTeK COMPUTER INC.
BIOS Version:           3101

Dom0 OS:
Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64
Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux.
Kernel is gentoo-sources, with experimental use-flag. Cpu type set to
Haswell. Issue also happened without experimental.
# cat /proc/cmdline
placeholder root=LABEL=ssdroot ro
console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen

The system is mostly built with stable packages, xen and xen-tools
keyworded to ~amd64.

I have been experiencing issues with domains with passed through PCIe
devices since I first installed xen. Then at version 4.5.x , I'm now
at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
pass through and interrupts (usb-cards, sound cards).

Recently the system has been more stable, whether it is because I pass
through as few things as possible, or because of improvements in Xen I
do not know. I have also taken to building with debug, which leads to
more abrupt but less mysterious failures. Earlier (w/o debug and under
xen 4.5 ) stuff would just gradually stop working and end up in total
hang of everything. So, hey, things are improving :-b
This isn't the first time we have seen this on Haswell processors. Do
you have microcode loading set up?
Not entirely sure to be honest. Is microcode    : 0x31 the newest?

I AM running the very latest bios from Asus, but I do not have confidence in my microcode loading setup, so I have not had one in place.

Trying now.

Downloading microcode.dat from Intel
Installing iucode_tool, which in its --help states:
  -w, --write-to=file        Write selected microcodes to a file in binary
                             format.  The binary format is suitable to be
                             uploaded to the kernel

Ran "iucode_tool microcode.dat -w microcode.bin"
# ls -l micro*
-rwxr-xr-x 1 root root  693248 Jan 15 12:40 microcode.bin
-rwxr-xr-x 1 root root 2081807 Nov  6 04:04 microcode.dat
placed microcode.bin in /boot/microcode.bin

 booted with :
xen_commandline : ssd-xen-debug-marker console_timestamps=date loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 core_parking=power ucode=microcode.bin

#cat /proc/cpuinfo | grep micro
says: microcode    : 0x31

This is no change from previous boot.
Now: How do I know wheter 0x31 is the newest?

Grepping the console output reveals no reference to ucode or microcode other than the Xen command-line.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.