[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Unstable + 3.0.0 testing, several issues



Hello Konrad, list,
    The PCIe serial card arrived today, so I managed to get some more
information about a few issues I'm facing:

    0. Unrelated, I'm unable to make grub appear on the serial
console, although xen logging works. The driver tells me:
        [   25.320408] 0000:0e:00.0: ttyF0 at I/O 0xdc00 (irq = 19) is a saturn
    and I have "serial --port=0xdc00 --speed=115200" in my grub.cfg.
However grub tells me that it is an unknown serial device. Device is a
Moschip MSC9912.

    1. The serial card only works in polling mode. If xen is booted
with "com1=115200,8n1,0xdc00,19" instead of
"com1=115200,8n1,0xdc00,0", xen will end up printing "(XEN) do_IRQ:
7.241 No irq handler for vector (irq -1)" indefinitely.

    2. Regarding the issue with the system rebooting when a Windows 7
HVM (with a videocard in _PCI_ NOT VGA passthrough) reboots, here is
the kernel panic that is logged:

(XEN) irq.c:1686: dom1: forcing unbind of pirq 16
(XEN) irq.c:1686: dom1: forcing unbind of pirq 19
(XEN) irq.c:1686: dom1: forcing unbind of pirq 20
(XEN) irq.c:1686: dom1: forcing unbind of pirq 53
(XEN) irq.c:1686: dom1: forcing unbind of pirq 54
(XEN) irq.c:1686: dom1: forcing unbind of pirq 55
(XEN) Assertion 'entry->next->prev == entry' failed at
/home/xieliwei/xendev/xen-unstable.hg/xen/include/:172
(XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480125d70>] set_timer+0x189/0x216
(XEN) RFLAGS: 0000000000010082   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82c4802d8600   rcx: ffff82c4802d8608
(XEN) rdx: ffff8302d13065e0   rsi: 00000131556f0763   rdi: ffff82c4802d8600
(XEN) rbp: ffff82c48029fe30   rsp: ffff82c48029fdf0   r8:  0000000001c9c380
(XEN) r9:  0000000000000000   r10: 0000013153ef6a63   r11: 00ff00ff00ff00ff
(XEN) r12: ffff82c4802d8780   r13: 0000000000000000   r14: ffff82c4802d8780
(XEN) r15: 00000131556f0763   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000009f4ad000   cr2: ffff88001f787d68
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fdf0:
(XEN)    ffff82c4802d8628 0000000000000086 0000000600000000 ffff83009f76e000
(XEN)    ffff83009f768000 0000013153a543e3 ffff82c4802d85e0 0000000001c9c380
(XEN)    ffff82c48029feb0 ffff82c48011f6f8 ffff82c4802d8600 0000000000000002
(XEN)    ffff82c4802d85e0 00000000ffffffff ffff83009f768000 0000000001c9c380
(XEN)    ffff82c48029ff00 00ff00ff00ff00ff 0000013153ef6a63 ffff82c4802b8880
(XEN)    ffff82c4802b8880 ffff82c48029ff18 ffffffffffffffff 0000000000000002
(XEN)    ffff82c48029fee0 ffff82c4801227be ffff82c48029ff18 ffff82c48029ff18
(XEN)    00000000ffffffff ffff82c4802d85e0 ffff82c48029fef0 ffff82c48012283d
(XEN)    ffff82c48029ff10 ffff82c4801535cc ffff83009f76e000 ffff83009f768000
(XEN)    ffff82c48029fdc8 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000002 ffff88002f0abfd8 0000000000000246
(XEN)    00000001000498f4 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffffffff810013aa 0000000000000000 00000000deadbeef 00000000deadbeef
(XEN)    0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246
(XEN)    ffff88002f0abee0 000000000000e02b 000000000000beef 000000000000beef
(XEN)    000000000000beef 000000000000beef 0000000000000000 ffff83009f76e000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480125d70>] set_timer+0x189/0x216
(XEN)    [<ffff82c48011f6f8>] schedule+0x122/0x5d3
(XEN)    [<ffff82c4801227be>] __do_softirq+0x7e/0x89
(XEN)    [<ffff82c48012283d>] do_softirq+0x26/0x28
(XEN)    [<ffff82c4801535cc>] idle_loop+0x55/0x5b
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'entry->next->prev == entry' failed at
/home/xieliwei/xendev/xen-unstable.hg/xen/include/:172
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

    However, with the latest xen unstable and Konrad's testing kernel,
this doesn't happen any more. But, after a DomU reboot, windows fails
to boot because the videocard is unable to initialise and a BSOD with
code 0x00000116 (generated by the Nvidia driver) occurs. Note that
rebooting with a videocard PCI passed-through has worked in the past
at some point long ago (can't remember which version).

    3. Somehow, asking windows to shutdown causes qemu to reboot
instead of shutting down:

pt_iomem_map: e_phys=ffffffff maddr=a0100000 type=0 len=16384 index=0
first_map=0
pt_iomem_map: e_phys=ffffffff maddr=a0105000 type=0 len=4096 index=0 first_map=0
pt_iomem_map: e_phys=ffffffff maddr=a0104000 type=0 len=4096 index=0 first_map=0
reset requested in cpu_handle_ioreq.
Issued domain 2 reboot

    4. The ATI VGA (not PCI) passthrough patch isn't working with my
HD6850 any more. Not sure when that started since I stopped using it
after finding out that normal PCI passthrough works just as well. The
qemu log ends with:

ati_gfx_init: ATI GFX Guest Info:
       pio_index=0x00000004,       guest_pio_bar=0x0000c100
       mmio_bar1_index=0x00000000, guest_mmio_bar1=0xe0000000
       mmio_bar2_index=0x00000002, guest_mmio_bar2=0xf1020000

    and seems to be stuck after that.

    5. With today's xen+kernel, this appears during boot:

[   40.457597] XENBUS: Unable to read cpu state
[   40.457698] XENBUS: Unable to read cpu state
[   40.457818] XENBUS: Unable to read cpu state
[   40.457916] XENBUS: Unable to read cpu state
[   40.458025] XENBUS: Unable to read cpu state
[   40.458128] XENBUS: Unable to read cpu state
[   40.458232] XENBUS: Unable to read cpu state
[   40.458345] XENBUS: Unable to read cpu state
[   41.069583] ------------[ cut here ]------------
[   41.069590] WARNING: at fs/proc/base.c:1123 oom_adjust_write+0x2be/0x2e0()
[   41.069593] Hardware name:
[   41.069595] sshd (2811): /proc/2811/oom_adj is deprecated, please
use /proc/2811/oom_score_adj instead.
[   41.069597] Modules linked in: cpufreq_powersave cpufreq_stats
cpufreq_conservative acpi_cpufreq mperf binfmt_misc fuse nfsd exportfs
nfs lockd fscache auth_rpcgss nfs_acl sunrpc bridge stp ext3 jbd loop
firewire_sbp2 firewire_core crc_itu_t cxgb3 mdio mii parport_serial
psmouse parport_pc parport i2c_i801 i2c_core serio_raw pcspkr evdev
mxm_wmi wmi button processor thermal_sys ext4 mbcache jbd2 crc16
dm_mod sg sr_mod cdrom sd_mod crc_t10dif ahci libahci mvsas libsas
sky2 libata scsi_transport_sas scsi_mod [last unloaded:
scsi_wait_scan]
[   41.069656] Pid: 2811, comm: sshd Not tainted 3.0.0-xen-amd64+ #1
[   41.069658] Call Trace:
[   41.069664]  [<ffffffff8105cffb>] ? warn_slowpath_common+0x7b/0xc0
[   41.069667]  [<ffffffff8105d0f5>] ? warn_slowpath_fmt+0x45/0x50
[   41.069672]  [<ffffffff81071841>] ? __lock_task_sighand+0x61/0xb0
[   41.069674]  [<ffffffff8119771e>] ? oom_adjust_write+0x2be/0x2e0
[   41.069679]  [<ffffffff8113b5ce>] ? vfs_write+0xae/0x180
[   41.069682]  [<ffffffff8113b8f7>] ? sys_write+0x47/0x90
[   41.069686]  [<ffffffff814175a5>] ? page_fault+0x25/0x30
[   41.069690]  [<ffffffff8141dc92>] ? system_call_fastpath+0x16/0x1b
[   41.069693] ---[ end trace 99020d81b67fb2a0 ]---

    6. I have an old Ciprico software RAID card that presents itself
as five PCIe switches and four SATA controllers:

05:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
06:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
06:03.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
06:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
06:05.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
07:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
08:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
09:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
0a:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)

    Tried passingthrough everything including the switches, but
attempting to start the VM throws the following:

libxl: error: libxl_pci.c:742:libxl__device_pci_reset: The kernel
doesn't support reset from sysfs for PCI device 0000:05:00.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:05:00.0
libxl: error: libxl_pci.c:742:libxl__device_pci_reset: The kernel
doesn't support reset from sysfs for PCI device 0000:06:02.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:06:02.0
libxl: error: libxl_pci.c:742:libxl__device_pci_reset: The kernel
doesn't support reset from sysfs for PCI device 0000:06:03.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:06:03.0
libxl: error: libxl_pci.c:742:libxl__device_pci_reset: The kernel
doesn't support reset from sysfs for PCI device 0000:06:04.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:06:04.0
libxl: error: libxl_pci.c:742:libxl__device_pci_reset: The kernel
doesn't support reset from sysfs for PCI device 0000:06:05.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:06:05.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:07:00.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:08:00.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:09:00.0
libxl: error: libxl_device.c:608:libxl__wait_for_device_model: Device
Model not ready
libxl: error: libxl_pci.c:632:do_pci_add: qemu refused to add device:
0000:0a:00.0

    and the VM does nothing:

Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  1024     8     r-----      22.4
openfiler                                     2  2043     1     ------       0.0

    Only passing through the SATA controllers however, causes the
system to lock up while booting openfiler:

(XEN) domctl.c:1056:d0 ioport_map:remove f_gport=c100 f_mport=7c00 np=80
(XEN) domctl.c:1032:d0 ioport_map:add f_gport=c100 f_mport=7c00 np=80
(XEN) domctl.c:1056:d0 ioport_map:remove f_gport=c180 f_mport=8c00 np=80
(XEN) domctl.c:1032:d0 ioport_map:add f_gport=c180 f_mport=8c00 np=80
(XEN) domctl.c:1056:d0 ioport_map:remove f_gport=c200 f_mport=9c00 np=80
(XEN) domctl.c:1032:d0 ioport_map:add f_gport=c200 f_mport=9c00 np=80
(XEN) domctl.c:1056:d0 ioport_map:remove f_gport=c280 f_mport=ac00 np=80
(XEN) domctl.c:1032:d0 ioport_map:add f_gport=c280 f_mport=ac00 np=80
(XEN) irq.c:264: Dom1 PCI link 0 changed 5 -> 0
(XEN) irq.c:264: Dom1 PCI link 1 changed 10 -> 0
(XEN) irq.c:264: Dom1 PCI link 2 changed 11 -> 0
(XEN) irq.c:264: Dom1 PCI link 3 changed 5 -> 0
[System locks up after this]

    Last line in openfiler is "Starting udev:"

    The effects last through soft reboots. On the first reboot, xen panics:

(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 248kB init memory.
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
(XEN) mm.c:4225:d0 Bad page 0000000000440d46: ed=ffff83045cc64000(0),
sd=ffff83045cc64000, caf=8000080000000001, taf=7400000000000000
[    6.918340] Kernel panic - not syncing: DMA(-12): Failed to
exchange pages allocated for DMA with Xen! We either don't have the
permission or you do not have enoughfree memory under 4GB!
[    6.918342]
[    6.936685] Pid: 0, comm: swapper Not tainted 3.0.0-rc7-1.9zion-xen-amd64+ #1
[    6.943925] Call Trace:
[    6.946402]  [<ffffffff81404825>] ? panic+0x9f/0x1a0
[    6.951433]  [<ffffffff818ca26d>] ? xen_swiotlb_init+0xf7/0x12f
[    6.957436]  [<ffffffff818aaea2>] ? pci_swiotlb_detect_4gb+0x27/0x27
[    6.963881]  [<ffffffff814177a7>] ? bad_to_user+0x6b1/0x6b1
[    6.969529]  [<ffffffff8189e569>] ? pci_xen_swiotlb_init+0x14/0x27
[    6.975797]  [<ffffffff818aae5e>] ? add_pcspkr+0x37/0x37
[    6.981189]  [<ffffffff818a10bb>] ? pci_iommu_alloc+0x52/0x67
[    6.987015]  [<ffffffff818aea9b>] ? mem_init+0x14/0xe5
[    6.992225]  [<ffffffff8189a9b9>] ? start_kernel+0x1be/0x3c3
[    6.997964]  [<ffffffff8189c7b2>] ? xen_start_kernel+0x5b1/0x5b7
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

    Subsequent reboots are successful but quickly degrades and crashes:

[ 1942.442574] INFO: rcu_preempt_state detected stalls on CPUs/tasks:
{ 0 4 5} (detected by 3, t=19276 jiffies)

and

Jul 31 23:12:02 localhost kernel: [  610.801687] as[13327]: segfault
at ad938bef ip 00002b81ad72e519 sp 00007fffd87f0438 error 6 in
ld-2.13.so[2b81ad717000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.644683] sh[23035]: segfault
at 1f273bef ip 00002b001f069519 sp 00007fff149996c8 error 6 in
ld-2.13.so[2b001f052000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.645221] uname[23036]:
segfault at 25e2cbef ip 00002b5025c22519 sp 00007fffdf186db8 error 6
in ld-2.13.so[2b5025c0b000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.645700] sh[23037]: segfault
at f4264bef ip 00002ac3f405a519 sp 00007ffffdf21598 error 6 in
ld-2.13.so[2ac3f4043000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.646167] which[23038]:
segfault at 615c5bef ip 00002afc613bb519 sp 00007fff4002f2f8 error 6
in ld-2.13.so[2afc613a4000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.646692] sh[23039]: segfault
at d426fbef ip 00002b7bd4065519 sp 00007ffffe3bbd48 error 6 in
ld-2.13.so[2b7bd404e000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.647171] sh[23040]: segfault
at e651cbef ip 00002b8be6312519 sp 00007fff4f4ae6e8 error 6 in
ld-2.13.so[2b8be62fb000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.647649] sh[23041]: segfault
at 9f9f3bef ip 00002aea9f7e9519 sp 00007fff173aa558 error 6 in
ld-2.13.so[2aea9f7d2000+1f000]
Jul 31 23:14:52 localhost kernel: [  780.651179] make[23042]: segfault
at 4fc83bef ip 00002b524fa79519 sp 00007fff36460688 error 6 in
ld-2.13.so[2b524fa62000+1f000]
Jul 31 23:14:59 localhost kernel: [  788.157562] make[23043]: segfault
at f81fbbef ip 00007feff7ff1519 sp 00007fff71f105e8 error 6 in
ld-2.13.so[7feff7fda000+1f000]
Jul 31 23:15:01 localhost kernel: [  790.085750] make[23044]: segfault
at 55b95bef ip 00007f365598b519 sp 00007fff786b2d78 error 6 in
ld-2.13.so[7f3655974000+1f000]
Jul 31 23:15:02 localhost kernel: [  790.829308] make[23045]: segfault
at bf1ccbef ip 00007f2bbefc2519 sp 00007fffca21e978 error 6 in
ld-2.13.so[7f2bbefab000+1f000]
Jul 31 23:15:03 localhost kernel: [  791.429724] make[23046]: segfault
at c419ebef ip 00007fb3c3f94519 sp 00007fff2cc560a8 error 6 in
ld-2.13.so[7fb3c3f7d000+1f000]
Jul 31 23:15:03 localhost kernel: [  791.964308] make[23047]: segfault
at df823bef ip 00007f29df619519 sp 00007fffb16edb78 error 6 in
ld-2.13.so[7f29df602000+1f000]

    Only a hard (power off/on) reboot will normalise the system.

    Apologies for the long email! Or would the list prefer that I
split each issue off into its own email?

Xen:
# hg log|head
changeset:   23756:0f36c2eec2e1
tag:         tip
user:        Keir Fraser <keir@xxxxxxx>
date:        Thu Jul 28 15:40:54 2011 +0100
summary:     hvmloader: Enable SCI in QEMU has it disabled.

Kernel:
#git log
commit e37c6e0fac4fc41d988d03253d1cc0b44d1663fb
Merge: d2c97b2 95b6886
Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date:   Thu Jul 28 09:06:53 2011 -0400

    Merge remote-tracking branch 'linus/master' into testing

    * linus/master: (221 commits)
      Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
      signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked()
      sparc: rename atomic_add_unless
      proc: make struct proc_dir_entry::name a terminal array rather
than a pointer
      Btrfs: use the commit_root for reading free_space_inode crcs
      Btrfs: reduce extent_state lock contention for metadata
      Btrfs: remove lockdep magic from btrfs_next_leaf
      Btrfs: make a lockdep class for each root
      Btrfs: switch the btrfs tree locks to reader/writer
      Btrfs: fix deadlock when throttling transactions
      Btrfs: stop using highmem for extent_buffers
      Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
      Btrfs: tag pages for writeback in sync
      Btrfs: fix enospc problems with delalloc
      Btrfs: don't flush delalloc arbitrarily
      Btrfs: use find_or_create_page instead of grab_cache_page
      Btrfs: use a worker thread to do caching
      staging: brcm80211: Fix double include introduced by bad merge
      microblaze: Do not show error message for 32 interrupt lines
      xfs: optimize the negative xattr caching
      ...

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.