[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough


  • To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • From: Dante Cinco <dantecinco@xxxxxxxxx>
  • Date: Thu, 11 Nov 2010 17:02:55 -0800
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 11 Nov 2010 17:04:12 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=qamQraXgOi+EUFV/co5mKw+nDCsu6U+CGgq+HpiV6gMYkWozSfmDWYcZNKfiubbQTl 1oJetCDlsMq0XlfYTWihAUp2IWJvls88hApTQRdLI7jpSIvwEJJpi2wa+3Oio5b1AFOD tdcrT9YtAyin4jKs0Vpf5RH7sUOiwLUTJogNU=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Here's another datapoint: with iommu=1,passthrough,no-intremap,verbose
in the Xen command line and iommu=soft in the pvops domU command line
also results in an NMI (see below). Replacing iommu=soft with
swiotlb=force in pvops domU works reliably but with the I/O
performance degradation. It seems that regardless of whether iommu is
enabled or disabled in the hypervisor, swiotlb=force is necessary in
the pvops domU.

(XEN)
(XEN) NMI - I/O ERROR
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015c006>] do_IRQ+0x375/0x59c
(XEN) RFLAGS: 0000000000000002   CONTEXT: hypervisor
(XEN) rax: ffff83011dae4460   rbx: ffff8301616a6990   rcx: 000000000000010c
(XEN) rdx: 000000000000010c   rsi: 0000000000000086   rdi: 0000000000000001
(XEN) rbp: ffff82c480287e28   rsp: ffff82c480287db8   r8:  000000000000007a
(XEN) r9:  ffff8300df4d4060   r10: ffff83019fffac88   r11: 000001958595f304
(XEN) r12: ffff83011dae2000   r13: 0000000000000000   r14: 000000000000007f
(XEN) r15: ffff83019fe02200   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000001261ff000   cr2: 0000000000783000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287db8:
(XEN)    0000000000000043 0000000000000043 ffff83019fe02234 0000000000000000
(XEN)    000000000000010c ffff830000000000 ffff82c4802c2400 0000000000000002
(XEN)    ffff82c480287e10 ffff82c480287f18 ffff82c48024f6c0 ffff82c480287f18
(XEN)    ffff82c4802c2300 0000000000000002 00007d3b7fd781a7 ffff82c480154ee6
(XEN)    0000000000000002 ffff82c4802c2300 ffff82c480287f18 ffff82c48024f6c0
(XEN)    ffff82c480287ee0 ffff82c480287f18 000001958595f304 ffff83019fffac88
(XEN)    ffff8300df4d4060 ffff83019fffa9f0 ffff82c4802c23a0 0000000000000000
(XEN)    0000000000000000 ffff82c4802c2e80 0000000000000000 0000007a00000000
(XEN)    ffff82c48014e3c3 000000000000e008 0000000000000246 ffff82c480287ee0
(XEN)    000000000000e010 ffff82c480287f10 ffff82c480150664 0000000000000000
(XEN)    ffff8300df2fc000 ffff8300df4d4000 00000000ffffffff ffff82c480287db8
(XEN)    0000000000000000 ffffffffffffffff ffffffff81787160 ffffffff81669fd8
(XEN)    ffffffff81669ed0 ffffffff81668000 0000000000000246 ffff8800067c0200
(XEN)    0000019575abe291 0000000000000000 0000000000000000 ffffffff810093aa
(XEN)    0000000400000000 00000000deadbeef 00000000deadbeef 0000010000000000
(XEN)    ffffffff810093aa 000000000000e033 0000000000000246 ffffffff81669eb8
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300df2fc000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c48015c006>] do_IRQ+0x375/0x59c
(XEN)    [<ffff82c480154ee6>] common_interrupt+0x26/0x30
(XEN)    [<ffff82c48014e3c3>] default_idle+0x82/0x87
(XEN)    [<ffff82c480150664>] idle_loop+0x5a/0x68
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL TRAP: vector = 2 (nmi)
(XEN) [error_code=0000] , IN INTERRUPT CONTEXT
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

- Dante


On Thu, Nov 11, 2010 at 2:32 PM, Dante Cinco <dantecinco@xxxxxxxxx> wrote:
> With iommu=off,verbose in the Xen commandline, pvops domU works only
> with swiotlb=force and with the same performance degradation. Without
> swiotlb=force, there's no NMI but DMA does not work (see Ray Lin's
> reply on Thu 11/11/2010 11:42 AM).
>
> The XenPCIpassthrough wiki
> (http://wiki.xensource.com/xenwiki/XenPCIpassthrough) talks about
> setting iommu=pv in order to use the hardware IOMMU (VT-d) passthru
> for PV guests but I didn't see any difference compared to my original
> setting (iommu=1,passthrough,no-intremap). Is iommu=pv still required
> for this particular pvops domU kernel (xen-pcifront-0.8.2) and if it
> is, what should I be looking for in the Xen log (xm dmesg) to verify
> its efficacy?
>
> With my original setting (iommu=1,passthrough,no-intremap), here's what I see:
>
> (XEN) [VT-D]dmar.c:702: Host address width 39
> (XEN) [VT-D]dmar.c:717: found ACPI_DMAR_DRHD:
> (XEN) [VT-D]dmar.c:413:   dmaru->address = e7ffe000
> (XEN) [VT-D]iommu.c:1136: drhd->address = e7ffe000 iommu->reg = 
> ffff82c3fff57000
> (XEN) [VT-D]iommu.c:1138: cap = c90780106f0462 ecap = f0207e
> (XEN) [VT-D]dmar.c:356:   IOAPIC: 0:1e.1
> (XEN) [VT-D]dmar.c:356:   IOAPIC: 0:13.0
> (XEN) [VT-D]dmar.c:427:   flags: INCLUDE_ALL
> (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
> (XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.7
> (XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df7fc000 end_address df7fdfff
> (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
> (XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.0
> (XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.1
> (XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.2
> (XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.3
> (XEN) [VT-D]dmar.c:341:   endpoint: 2:0.0
> (XEN) [VT-D]dmar.c:341:   endpoint: 2:0.2
> (XEN) [VT-D]dmar.c:341:   endpoint: 2:0.4
> (XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df7f5000 end_address df7fafff
> (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
> (XEN) [VT-D]dmar.c:341:   endpoint: 5:0.0
> (XEN) [VT-D]dmar.c:341:   endpoint: 2:0.0
> (XEN) [VT-D]dmar.c:341:   endpoint: 2:0.2
> (XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df63e000 end_address df63ffff
> (XEN) [VT-D]dmar.c:727: found ACPI_DMAR_ATSR:
> (XEN) [VT-D]dmar.c:622:   atsru->all_ports: 0
> (XEN) [VT-D]dmar.c:327:   bridge: 0:a.0  start = 0 sec = 7  sub = 7
> (XEN) [VT-D]dmar.c:327:   bridge: 0:9.0  start = 0 sec = 8  sub = a
> (XEN) [VT-D]dmar.c:327:   bridge: 0:8.0  start = 0 sec = b  sub = d
> (XEN) [VT-D]dmar.c:327:   bridge: 0:7.0  start = 0 sec = e  sub = 10
> (XEN) [VT-D]dmar.c:327:   bridge: 0:6.0  start = 0 sec = 18  sub = 1a
> (XEN) [VT-D]dmar.c:327:   bridge: 0:5.0  start = 0 sec = 15  sub = 17
> (XEN) [VT-D]dmar.c:327:   bridge: 0:4.0  start = 0 sec = 14  sub = 14
> (XEN) [VT-D]dmar.c:327:   bridge: 0:3.0  start = 0 sec = 11  sub = 13
> (XEN) [VT-D]dmar.c:327:   bridge: 0:2.0  start = 0 sec = 6  sub = 6
> (XEN) [VT-D]dmar.c:327:   bridge: 0:1.0  start = 0 sec = 5  sub = 5
> (XEN) Intel VT-d Snoop Control not enabled.
> (XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
> (XEN) Intel VT-d Queued Invalidation enabled.
> (XEN) Intel VT-d Interrupt Remapping not enabled.
> (XEN) I/O virtualisation enabled
> (XEN)  - Dom0 mode: Relaxed
> (XEN) Enabled directed EOI with ioapic_ack_old on!
> (XEN) [VT-D]iommu.c:743: iommu_enable_translation: iommu->reg = 
> ffff82c3fff57000
>
> domU bringup:
>
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.3
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.3
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.2
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.2
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.1
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.3
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.3
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.2
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.2
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.1
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 15:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 15:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 15:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 15:0.1
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 18:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 18:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 18:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 18:0.1
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = b:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = b:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = b:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = b:0.1
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = e:0.0
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = e:0.0
> (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = e:0.1
> (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = e:0.1
> mapping kernel into physical memory
> about to get started...
>
> - Dante
>
> On Thu, Nov 11, 2010 at 11:03 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
>> On Thu, Nov 11, 2010 at 10:31:48AM -0800, Dante Cinco wrote:
>>> Konrad,
>>>
>>> Without swiotlb=force, I don't see "PCI-DMA: Using software bounce
>>> buffering for IO" in /var/log/kern.log.
>>>
>>> With iommu=soft and without swiotlb=force, I see the "software bounce
>>> buffering" in /var/log/kern.log and an NMI (see below) when I load the
>>> kernel module drivers. I made sure the NMI is reproducible and not a
>>
>> What is the kernel module doing to cause this? DMA?
>>> one-time event.
>>
>> So doing 64-bit DMA causes an NMI. Do you have the Hypervisor's IOMMU VT-d
>> enabled or disabled? (iommu=off,verbose) If you turn it off does this work?
>>>
>>> /var/log/kern.log (iommu=soft):
>>> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
>>> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800000
>>> software IO TLB at phys 0x5800000 - 0x9800000
>>>
>>> (XEN)
>>> (XEN)
>>> (XEN) NMI - I/O ERROR
>>> (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
>>> (XEN) RFLAGS: 0000000000000012   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000080   rbx: ffff82c480287c48   rcx: 0000000000000000
>>> (XEN) rdx: 0000000000000080   rsi: 0000000000000080   rdi: ffff82c480287c48
>>> (XEN) rbp: ffff82c480287c78   rsp: ffff82c480287c38   r8:  0000000000000000
>>> (XEN) r9:  0000000000000037   r10: 0000ffff0000ffff   r11: 00ff00ff00ff00ff
>>> (XEN) r12: ffff82c48029f080   r13: 0000000000000001   r14: 0000000000000008
>>> (XEN) r15: ffff82c4802b0c20   cr0: 000000008005003b   cr4: 00000000000026f0
>>> (XEN) cr3: 00000001250a9000   cr2: 00007f6165ae9428
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c480287c38:
>>> (XEN)    ffff82c480287c78 ffff82c48012001f 0000000000000100 0000000000000000
>>> (XEN)    ffff82c480287ca8 ffff83011dadd8b0 ffff83019fffa9d0 ffff82c4802c2300
>>> (XEN)    ffff82c480287cc8 ffff82c480117d0d ffff82c48029f080 0000000000000001
>>> (XEN)    0000000000000100 0000000000000000 0000000000000002 ffff8300df606000
>>> (XEN)    000000411de66867 ffff82c4802c2300 ffff82c480287d28 ffff82c48011f299
>>> (XEN)    0000000000000100 0000000000000086 ffff83019e3fa000 ffff83011dadd8b0
>>> (XEN)    ffff83019fffa9d0 ffff8300df606000 0000000000000000 0000000000000000
>>> (XEN)    000000000000007f ffff83019fe02200 ffff82c480287d38 ffff82c48011f6ea
>>> (XEN)    ffff82c480287d58 ffff82c48014e4c1 ffff83011dae2000 0000000000000066
>>> (XEN)    ffff82c480287d68 ffff82c48014e54d ffff82c480287d98 ffff82c480105d59
>>> (XEN)    ffff82c480287da8 ffff8301616a6990 ffff83011dae2000 0000000000000000
>>> (XEN)    ffff82c480287da8 ffff82c480105f81 ffff82c480287e28 ffff82c48015c043
>>> (XEN)    0000000000000043 0000000000000043 ffff83019fe02234 0000000000000000
>>> (XEN)    000000000000010c 0000000000000000 0000000000000000 0000000000000002
>>> (XEN)    ffff82c480287e10 ffff82c480287f18 ffff82c48024f6c0 ffff82c480287f18
>>> (XEN)    ffff82c4802c2300 0000000000000002 00007d3b7fd781a7 ffff82c480154ee6
>>> (XEN)    0000000000000002 ffff82c4802c2300 ffff82c480287f18 ffff82c48024f6c0
>>> (XEN)    ffff82c480287ee0 ffff82c480287f18 00ff00ff00ff00ff 0000ffff0000ffff
>>> (XEN)    0000000000000000 0000000000000000 ffff82c4802c23a0 0000000000000000
>>> (XEN)    0000000000000000 ffff82c4802c2e80 0000000000000000 0000007a00000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
>>> (XEN)    [<ffff82c480117d0d>] csched_vcpu_wake+0x2e1/0x302
>>> (XEN)    [<ffff82c48011f299>] vcpu_wake+0x243/0x43e
>>> (XEN)    [<ffff82c48011f6ea>] vcpu_unblock+0x4a/0x4c
>>> (XEN)    [<ffff82c48014e4c1>] vcpu_kick+0x21/0x7f
>>> (XEN)    [<ffff82c48014e54d>] vcpu_mark_events_pending+0x2e/0x32
>>> (XEN)    [<ffff82c480105d59>] evtchn_set_pending+0xbf/0x190
>>> (XEN)    [<ffff82c480105f81>] send_guest_pirq+0x54/0x56
>>> (XEN)    [<ffff82c48015c043>] do_IRQ+0x3b2/0x59c
>>> (XEN)    [<ffff82c480154ee6>] common_interrupt+0x26/0x30
>>> (XEN)    [<ffff82c48014e3c3>] default_idle+0x82/0x87
>>> (XEN)    [<ffff82c480150664>] idle_loop+0x5a/0x68
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>> (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>>
>>> Dante
>>>
>>>
>>> On Thu, Nov 11, 2010 at 8:04 AM, Konrad Rzeszutek Wilk
>>> <konrad.wilk@xxxxxxxxxx> wrote:
>>> > On Wed, Nov 10, 2010 at 05:16:14PM -0800, Dante Cinco wrote:
>>> >> We have Fibre Channel HBA devices that we PCI passthrough to our pvops
>>> >> domU kernel. Without swiotlb=force in the domU's kernel command line,
>>> >> both domU and dom0 lock up after loading the kernel module drivers for
>>> >> the HBA devices. With swiotlb=force, the domU and dom0 are stable
>>> >
>>> > Whoa. That is not good - what happens if you just pass in iommu=soft?
>>> > Does the PCI-DMA: Using.. show up if you don't pass in any of those 
>>> > parameters?
>>> > (I don't think it does, but just doing 'iommu=soft' should enable it).
>>> >
>>> >
>>> >> after loading the kernel module drivers but the I/O performance is at
>>> >> least an order of magnitude worse than what we were seeing with the
>>> >> HVM kernel. I see the following in /var/log/kern.log in the pvops
>>> >> domU:
>>> >>
>>> >> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
>>> >> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800000
>>> >> software IO TLB at phys 0x5800000 - 0x9800000
>>> >>
>>> >> Is swiotlb=force responsible for the I/O performance degradation? I
>>> >> don't understand what swiotlb=force does so I would appreciate an
>>> >> explanation or a pointer.
>>> >
>>> > So, you should only need to use 'iommu=soft'. It will enable the Linux 
>>> > kernel IOMMU
>>> > to translate the pseudo-PFNs to the real machine frame numbers (bus 
>>> > addresses).
>>> >
>>> > If your card is 64-bit, then that is all it would do. If however your 
>>> > card is 32-bit
>>> > and your are DMA-ing data from above the 32-bit limit, it would copy the 
>>> > user-space page
>>> > to memory below 4GB, DMA that, and when done, copy it back to the where 
>>> > the user-space
>>> > page is. This is called bounce-buffering and this is why you would use a 
>>> > mix of
>>> > pci_map_page, pci_sync_single_for_[cpu|device] calls around your driver.
>>> >
>>> > However, I think your cards are 64-bit, so you don't need this 
>>> > bounce-buffering. But
>>> > if you say 'swiotlb=force' it will force _all_ DMAs to go through the 
>>> > bounce-buffer.
>>> >
>>> > So, try just 'iommu=soft' and see what happens.
>>> >
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.