[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: Intel GPU pass-through with > 3G



I have also noticed this issue (9ms IOMMU flush), although I not during
domain creation. The path in which I observed it is page remapping when
using map_grant_ref. I haven't tested a DomU with over 3G of memory,
however; the delay may also be present in that case on my platform.

I have done some work to try to add an 'order' parameter to iommu_map_page,
but it isn't stable yet; if this is the only way to get around the slow
flush, I will look at finishing it.

Would it be possible to add a flag to delay IOMMU flushing until after a
batch update is finished? A single flush at the end, even if expensive,
would be faster than 10ms per page on mappings of a significant size. This
is also likely to be a less intrusive patch.

In case you're interested, my platform is a Dell Optiplex 755, 4G RAM:

# lspci
00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 
02)
00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated 
Graphics Controller (rev 02)
00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics 
Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network 
Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI 
Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio 
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI 
Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller 
(rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)

-- 
Daniel De Graaf
National Security Agency

On 11/10/2010 07:04 PM, Kay, Allen M wrote:
> Jean,
> 
> Do you see any boot time difference between passing through integrated 
> graphics for the very first time and the subsequent times?  Which platform 
> are you using?
> 
> Allen
> 
> -----Original Message-----
> From: Jean Guyader [mailto:jean.guyader@xxxxxxxxx] 
> Sent: Wednesday, November 10, 2010 1:50 PM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Cc: Kay, Allen M
> Subject: Intel GPU pass-through with > 3G
> 
> Hello,
> 
> I'm passing through a graphic card to a guest that has more than 3G of
> RAM (4G to be precise in my case).
> 
> What happen is that the VM creation is stuck in the process, so I put
> some tracing in the Xen code to see what
> was taking the time. I discovered that the guest was stuck in
> hvmloader inside this loop:
> 
>    while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>     {
>         struct xen_add_to_physmap xatp;
>         if ( hvm_info->high_mem_pgend == 0 )
>             hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
>         xatp.domid = DOMID_SELF;
>         xatp.space = XENMAPSPACE_gmfn;
>         xatp.idx   = --hvm_info->low_mem_pgend;
>         xatp.gpfn  = hvm_info->high_mem_pgend++;
>         if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>             BUG();
>     }
> 
> This loop relocate the RAM on the top to leave so space for the PCI BARs.
> It's loop on each page so in my case it's quite a big loop because the
> GPU has a BAR of 256M.
> 
> So the interesting is that the function add_to_physmap takes most of
> the time. I believe
> that what takes most part of it is the iommu iotlb flush that come
> with the iommu_map_pages
> or the iommu_unmap_page which are called when we manipulate the p2m table.
> 
> In my case the iommu flush take a very long time (because of the intel
> gpu ?), about 10
> milliseconds. So if I'm patient enough my domain will start, about 10 minutes.
> 
> A way to go will be to create a range interface to iommu_map_page
> iommu_unmap_page
> since iommu_flush are so expensive. Then some work need to be done to
> add a range interface
> to all the function between add_to_physmap and the p2m_set_entry which
> would be a big
> patch. I hope there is another way out of this problem.
> 
> Thanks,
> Jean

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.