[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IGD pass-through failures since 4.10.


  • To: "Dr. Greg" <greg@xxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Feb 2022 09:56:34 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nmcLXdKCZh2RrIZlwlMXZjoH+JRYIYQEMFj87H/mvMU=; b=BW0gfoO1XvBiiZZzsm4Dfs07rfJR1gRwoiQEWZ8b8AyxjefAlHpnhhWv8aFn4wd7Wb/NDINRFxgni1cFq1DmdWZAno99a3a3Pwepkp4nXgCHi1sJXFrtICbLcgYUuDf8J2uHWRxxsrYtIht3IsHyu710CYZWRErbnqhE3FmpByZTHuEDQ9THq6KcF3qfS2GPzUm2II5jTPp23Eg3PqtPI0wRR6ZeR3G509JMoJg5wSn+oEwfnYfXnC8LPBOr33e5MQsNxADZL3VXoZJA3xdqvgaCjFFg4NC9O0w+KblGPN9ISm+Gkf9752mbkzBjGZIZ7GsrmrgZK3q3pf33lwsb+g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JSGZjnLoa6J37+f1/13hkjimeVkBuFSmtjdcf+eKPvAC7ostLXN9oxc2ZAGxsZ20gj//YzQSvBRFS60SH0gIcv6gRIrZrbsfds4zG6eLhrYI1AU1vtD1HJzZhCJFTpiS5/LXiBpDGyBidLXKNIotv+/eQZKLHM6GSgFFcK1iN7EuXn6Xu/WH+TEWL4exfLXRB31mbkvNgi3pwqY8kfoDO/CUWmSB2eoDj/jz2RPxCLxdFrmAddZbPZ8GF/4feBAeCyltz974nec3nepusLPqX6K8QJWFdiMlzkJjOJDM87R1OzP21E2Sd0q6lZPb36x+YF7HG4tFRSY21bdbU+if7A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxx
  • Delivery-date: Mon, 14 Feb 2022 08:56:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2022 07:00, Dr. Greg wrote:
> Good morning, I hope the week is starting well for everyone.
> 
> We've made extensive use of PCI based graphics pass through for many
> years, since around Xen 4.2.  In fact, we maintained a set of patches
> for ATI cards against qemu-traditional that have seen a lot of
> downloads from our FTP site.
> 
> We ended up switching to IGD based graphics a couple of years ago and
> built a stack on top of Xen 4.10 using qemu-traditional.  That
> coincided with our transition from Windows 7 to Windows 10.
> 
> We've never enjoyed anywhere near the stability with IGD/Windows-10
> that we had with the ATI/Windows-7 desktops, ie. we see fairly
> frequent crashes, lockups, reduced performance etc.  The ATI/Windows-y
> desktops were almost astonishingly reliable, ie. hundreds of
> consecutive Windows VM boot/passthrough cycles.
> 
> In order to try and address this issue we set out to upgrade our
> workstation infrastructure.  Unfortunately we haven't found anything
> that has worked post 4.10.
> 
> To be precise, 4.11 with qemu-traditional works, but upon exit from
> the virtual machine, to which the graphics adapter and USB controller
> are passed through to, both the USB controller and the graphics
> controller cannot be re-initialized and re-attached to the Dom0
> instance.
> 
> It appears to be a problem with mapping interrupts back to dom0 given
> that we see the following:
> 
> Feb 10 08:16:05 hostname kernel: xhci_hcd 0000:00:14.0: xen map irq failed 
> -19 for 32752 domain
> 
> Feb 10 08:16:05 hostname kernel: i915 0000:00:02.0: xen map irq failed -19 
> for 32752 domain
> 
> Feb 10 08:16:12 hostname kernel: xhci_hcd 0000:00:14.0: Error while assigning 
> device slot ID

Just on this one aspect: It depends a lot what precisely you've used as
4.10 before. Was this the plain 4.10.4 release, or did you track the
stable branch, accumulating security fixes? In the former case I would
suspect device quarantining to get getting in your way. In which case
it would be relevant to know what exactly "re-attach to the Dom0" means
in your case.

Which brings me to this more general remark: What you describe sounds
like a number of possibly independent problems. I'm afraid it'll be
difficult for anyone to help without you drilling further down into
what lower level operations are actually causing trouble. It also feels
as if things may have ended up working for you on 4.10 just by chance.

I'm sorry that I'm not really of any help here,
Jan

> At which point the monitor has green and block bars on it and the USB
> controller doesn't function.
> 
> Upstream QEMU doesn't work at all, the qemu-system-i386 process fails
> and is caught by xl and then tries to re-start the domain, which
> remains dead to the world and has to be destroyed.
> 
> We revved up to the most current 4.14.x release, but that acts exactly
> the same way that 4.11.x does.  We've built up the most recent 4.15.x
> release, so that we would be testing the most current release that
> still supports qemu-traditional, but haven't been able to get the
> testing done yet.  Given our current experiences, I would be surpised
> if it would work.
> 
> We've tentatively tracked the poor Windows 10 performance down to the
> hypervisor emitting hundreds of thousands of IOMMU/DMA violations.  We
> made those go away by disabling the IGD IOMMU but that doesn't fix the
> problem with upstream QEMU being able to boot the Windows instance,
> nor does it fix the problem with remapping the device interrupts back
> to Dom0 on domain exit.
> 
> The 4.10 based stack had been running with 16 GIG of memory in the
> DomU Windows instances.  Based on some online comments, we tested
> guests with 4 GIG of RAM but that doesn't impact the issues we are
> seeing.
> 
> We've tested with the most recent 5.4 and 5.10 Linux kernels but the
> Dom0 kernel version doesn't seem to have any impact on the issues we
> are seeing.
> 
> We'd be interested in any comments/suggestions the group may have.  We
> have the in-house skills to do fairly significant investigations and
> would like to improve the performance of IGD pass-through for other
> users of what is fairly useful and ubiquitious (IGD) technology.
> 
> Have a good day.
> 
> Dr. Greg
> 
> As always,
> Dr. Greg Wettstein, Ph.D, Worker      Autonomously self-defensive
> Enjellic Systems Development, LLC     IOT platforms and edge devices.
> 4206 N. 19th Ave.
> Fargo, ND  58102
> PH: 701-281-1686                      EMAIL: dg@xxxxxxxxxxxx
> ------------------------------------------------------------------------------
> "My thoughts on the composition and effectiveness of the advisory
>  committee?
> 
>  I think they are destined to accomplish about the same thing as what
>  you would get from locking 9 chimpanzees in a room with an armed
>  thermonuclear weapon and a can opener with orders to disarm it."
>                                 -- Dr. Greg Wettstein
>                                    Resurrection
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.