[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: PCI BAR register space written with garbage in HVM guest.


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dan Gora <dan.gora@xxxxxxxxx>
  • Date: Tue, 16 Mar 2010 21:14:36 -0300
  • Delivery-date: Tue, 16 Mar 2010 17:15:49 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=xNTgVOIXdMRUZEHRdquv1MgzuTjQd4HDReVio3sgUPN5LKtqEnz3D4OJdubONF4TM3 VbiNM+AF2KA1PM9E5uDR3k2xglZi9gmYaGoQzutc27fnKlvUpqi6RNCd3T38WhpDcZ3m TGOgMAcKLfHS5NXw5FaWjRdI1Aks2fJN7r5B8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi all,

So I have definately tracked the writes as coming from qemu-dm, but
I'm still having trouble with finding out what is causing it.  I can
now see the writes occuring down in the kernel (I only had
pci_read_bus_XX traced, I didn't have pci_read_user_XX traced before..
sigh...) and I can also see the problem occuring with strace as well,
but I still cannot find where in qemu corresponds to this behavior...

So here is what I see:

Recall that from Xen HV I can see the bad writes occuring:

> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080000 offset=0x0
> bytes=4 value=0xffffffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080004 offset=0x0
> bytes=4 value=0x1600ffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080008 offset=0x0
> bytes=4 value=0x64d5323e
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008000c offset=0x0
> bytes=4 value=0x450008

I also traced the dom0 kernel better and can now also see these same
writes occuring:

Mar 16 21:35:49 sunblade1 kernel: pci_conf1_write: writing 8:0 reg 0x0
len=4 value=0xffffffff
Mar 16 21:35:49 sunblade1 kernel: Pid: 5904, comm: qemu-dm Tainted: G
        2.6.27.19-5-xen.dan #7
Mar 16 21:35:49 sunblade1 kernel:
Mar 16 21:35:49 sunblade1 kernel: Call Trace:
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020c55a>] dump_trace+0xba/0x379
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020c867>]
show_trace_log_lvl+0x4e/0x69
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020d5d8>] show_trace+0x15/0x19
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8048d873>] dump_stack+0x77/0x80
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80407a92>]
pci_conf1_write+0x86/0x113
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8040940f>] raw_pci_write+0x24/0x3d
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80409482>] pci_write+0x2c/0x2e
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8035ec0f>]
pci_user_write_config_dword+0x62/0xba
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80364e49>]
pci_write_config+0x110/0x192
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802ff3eb>] write+0xf5/0x12f
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802a9bea>] vfs_write+0xb3/0x15c
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802a9d61>] sys_write+0x4c/0x75
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020b7f2>] tracesys+0xaf/0xb4
Mar 16 21:35:49 sunblade1 kernel:  [<00007f05d53f97fb>] 0x7f05d53f97fb
Mar 16 21:35:49 sunblade1 kernel:
Mar 16 21:35:49 sunblade1 kernel: pci_conf1_write: writing 8:0 reg 0x4
len=4 value=0x1600ffff
Mar 16 21:35:49 sunblade1 kernel: Pid: 5904, comm: qemu-dm Tainted: G
        2.6.27.19-5-xen.dan #7
Mar 16 21:35:49 sunblade1 kernel:
Mar 16 21:35:49 sunblade1 kernel: Call Trace:
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020c55a>] dump_trace+0xba/0x379
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020c867>]
show_trace_log_lvl+0x4e/0x69
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020d5d8>] show_trace+0x15/0x19
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8048d873>] dump_stack+0x77/0x80
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80407a92>]
pci_conf1_write+0x86/0x113
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8040940f>] raw_pci_write+0x24/0x3d
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80409482>] pci_write+0x2c/0x2e
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8035ec0f>]
pci_user_write_config_dword+0x62/0xba
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff80364e49>]
pci_write_config+0x110/0x192
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802ff3eb>] write+0xf5/0x12f
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802a9bea>] vfs_write+0xb3/0x15c
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff802a9d61>] sys_write+0x4c/0x75
Mar 16 21:35:49 sunblade1 kernel:  [<ffffffff8020b7f2>] tracesys+0xaf/0xb4
Mar 16 21:35:50 sunblade1 kernel:  [<00007f05d53f97fb>] 0x7f05d53f97fb
Mar 16 21:35:50 sunblade1 kernel:


I also traced qemu-dm with strace and found the culprit:
5904  21:35:47 [    7f05d3b68b2b]
open("/sys/bus/pci/devices/0000:08:00.0/config", O_RDWR) = 6
5904  21:35:47 [    7f05d53fa3c8] pwrite(6, "\6\1", 2, 4) = 2
5904  21:35:47 [    7f05d3b6eb77] ioctl(16, EVIOCGKEYCODE, 0x7fffdde98890) = 0
5904  21:35:47 [    7f05d53f987b] read(4, 0x7fffdde98870, 16) = -1
EAGAIN (Resource temporarily unavailable)
5904  21:35:47 [    7fffddf437dc] clock_gettime(CLOCK_MONOTONIC,
{1724, 868201462}) = 0
5904  21:35:47 [    7fffddf437dc] clock_gettime(CLOCK_MONOTONIC,
{1724, 868259075}) = 0

<snip>

5904  21:35:48 [    7f05d53f987b] read(16, "o\0\0\0", 4) = 4
5904  21:35:48 [    7f05d53f97fb] write(16, "o\0\0\0", 4) = 4
5904  21:35:48 [    7f05d53f97fb] write(6,
"\377\377\377\377\377\377\0\26>2\325d\10\0E\0\2@\0\354\0\0@\21w\302\0\0\0\0\377\377"...,
590) = 256

Notice the length of 590 bytes!  The EIP also matches what the kernel
thinks was the syscall which caused the write.

Anyone have any idea where this could be happening in qemu?  I'm
pretty sure that it's not originating with the domU because I traced
the domU kenrel's pci config access routines pci_conf1_write/read the
same way as the dom0's and there are no printfs in the domU kernel
when this occurs.

Again, thanks for all the help,
dan

On Mon, Mar 15, 2010 at 10:09 PM, Dan Gora <dan.gora@xxxxxxxxx> wrote:
> Hi All,
>
> I'm having a problem where if I pass through two instances of my
> device to a HVM domU, one of the board instances is having it's PCI
> BAR registers overwritten with garbage by some unknown actor 30
> seconds to a minute after I load my driver.  I cannnot for the life of
> me find what might possibly be overwriting the BAR registers.
>
> I've added a debugging printf to XEN in
> xen/arch/x86/pci.c:pci_conf_write() and I can see the entire PCI BAR
> address space being overwritten with garbage:
>
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080000 offset=0x0
> bytes=4 value=0xffffffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080004 offset=0x0
> bytes=4 value=0x1600ffff
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080008 offset=0x0
> bytes=4 value=0x64d5323e
> (XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008000c offset=0x0
> bytes=4 value=0x450008

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.