[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help] Xen 4.14.5 on Devuan 4.0 Chimaera, regression from Xen 4.0.1


  • To: Denis <tachyon_gun@xxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 13 Mar 2023 10:36:37 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kiw5R6dGNTmZpxjC2pLBzmNv2QN/xljCAL/x6addyI0=; b=ktTG8JB+sfDTbiLIZLBT2S4RhDDpsLaBez2WiC2q0KTRZrnaTlvyDCzG7eOSCHn+JUVNfXws1R3BMgZ5sWXPBPPai2+YnxCdQKbx1AVwxeCnmNiBGvVfQURV3wr31ynqv1tXsPGKfW+wRDtwAFl5CdgwJKR06tiruY3rdAr74HZZa4TTNpu+0XyK6WCcS+/pSe7M97dBD1mEXEzWbpk63r573gKJHAWXzRDPNyObTBdAbKqSBQfNPVuc6B4A83tcZZ09rdfhp8y5O2xW9hSW6oJ3t6TmA25xBg6twcH036P0V1gYPptxaJWR31T1CZNd3NOvhzCOA9p0BlvhBiikFA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nc6RMKMN08Yjfm3E6jwltZMHy/6fb3sLakVy44ro1iRbAR2bt3kPvpeqn+jehbe7Yjg1fvDcEkFuspbrTpY6ulhFLutHZQluybyF3Ci4T7L+9cOZA1MEqtlSx1biy/T+qtcQ4nXZ3oJNOCJwCvcRpGeM6EXhIybqljTOl9c57Tib8u0KVu/nQPtGUFFvqoE2wOYNQUpYzcVE9WZo5PLWlrSOyc3Iq/4FIEAjKVZ+xJRqudS6pJQH/7mVmHpLetZv9g7lJxZXL8fgAMFS6sxcBRa0s6tAHGSidMMHb0wuxzF5pG3C2zcPfL8cABjHbVuuIcqieHmu6yw2pPSLXZuHlg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Mon, 13 Mar 2023 09:37:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.03.2023 21:50, Denis wrote:
> On 10.03.2023 09:51, Jan Beulich wrote:
>> On 09.03.2023 21:37, Andrew Cooper wrote:
>>> On 09/03/2023 7:34 pm, tachyon_gun@xxxxxx wrote:
>>>> A short snippet of what I see when invoking "xl dmesg":
>>>>  
>>>> (XEN) No southbridge IO-APIC found in IVRS table
>>>> (XEN) AMD-Vi: Error initialization
>>>> (XEN) I/O virtualisation disabled 
>>>>  
>>>> What I would like to see (taken from Xen 4.0.1 running on Debian
>>>> Squeeze, in use since 2011):
>>>>  
>>>> (XEN) IOAPIC[0]: apic_id 8, version 33, address 0xfec00000, GSI 0-23
>>>> (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
>>>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>>>> (XEN) Detected 2611.936 MHz processor.
>>>> (XEN) Initing memory sharing.
>>>> (XEN) HVM: ASIDs enabled.
>>>> (XEN) HVM: SVM enabled
>>>> (XEN) HVM: Hardware Assisted Paging detected.
>>>> (XEN) AMD-Vi: IOMMU 0 Enabled.
>>>> (XEN) I/O virtualisation enabled
>>>>  
>>>> My question would be if this is "normal" behaviour due to older hardware
>>>> being used with newer versions of Xen (compared to the old 4.0.1) or if
>>>> this is a bug.
>>>> If the latter, has this been addressed already in newer version (4.14+)?
>>
>> No, the code there is still the same. The commit introducing the check
>> (06bbcaf48d09 ["AMD IOMMU: fail if there is no southbridge IO-APIC"])
>> specifically provided for a workaround: "iommu=no-intremap" on the Xen
>> command line. Could you give this a try? (As per below this could be
>> what we want to do "automatically" in such a situation, i.e. without
>> the need for a command line option. But you then still would face a
>> perceived regression of interrupt remapping being disabled on such a
>> system.)
>>
>> The other possible workaround, "iommu=no-amd-iommu-perdev-intremap",
>> is something I rather wouldn't want to recommend, but you may still
>> want to give it a try.
>  
> Thanks for your reply.
> 
> I added the lines you suggested and it seems that "AMD-Vi: IOMMU 0" and
> "I/O virtualisation" is enabled again.

Good - that'll have to do as a workaround for the time being.

> There are only minor differences in the "xl dmesg" output.
> In the one with "iommu=no-amd-iommu-perdev-intremap", 
> the line "No southbridge IO-APIC found in IVRS table" is listed.

That's as expected - the message is issued as a non-error one in this
case.

> Though I yet have to test a HVM domU with passthrough.
> 
> I'll attach the two "xl dmesg" files and a third one from the old version of 
> Xen.
> 
>>>> I'll attach some log files (hypervisor.log, dom0.log, xl_info.log,
>>>> lspci_vvv.log, acpi.dmp, ivrs.dat, ivrs.dsl).
>>>>  
>>>> Thank you for your time.
>>>
>>> Let me braindump the investigation so far before I forget it.
>>>
>>> Xen requires that there is an IVRS special-device record describing an
>>> IO-APIC 00:14.0.  This check failing is the source of the "No
>>> southbridge" message, and the cause of the IOMMU(s) being turned off.
>>>
>>> The MADT and IVRS tables agree that there is one IO-APIC in the system,
>>> but that's the northbridge IO-APIC, not the southbridge.
>>>
>>> The block diagram for the southbridge does have a PIC/IO-APIC as part of
>>> the PCI bridge, so honestly I was expecting the MADT to describe 2
>>> IO-APICs.  But OTOH, I could see this legitimately not existing in
>>> configurations where the PCI bridge isn't in use.
>>>
>>> `xl dmesg` does have a few unknown irqs, so there might be something
>>> down in the southbridge really generating interrupts.  Or there might be
>>> a IRQ misconfiguration elsewhere, and this is just a red herring.
>>>
>>> However, a consequence of the northbridge and southbridge being separate
>>> chips means that all southbridge IO is fully encapsulated by the IOMMU
>>> in the northbridge.
>>>
>>> So irrespective of whether there is ah IO-APIC operating properly in the
>>> southbridge, and whether or not it's properly described, I think Xen's
>>> insistence that there must be an IVRS special-device entry for it is bogus.
>>>
>>>
>>> Furthermore, Xen's decisions are monumentally stupid.  It takes a
>>> specifically safe (IOMMU-wise) system, and because it can't figure out a
>>> partial aspect of interrupt handling the southbridge, decided that the
>>> system can't be safe (a false conclusion) and turns the IOMMU off fully
>>> to compensate, which makes the system concretely less safe.
> 
> Also, thank you Andrew for bringing this in.
> 
>> So this touches once again the area of the fuzzy split between the IOMMU
>> being disabled as a whole (meaning DMA+interrupt remapping off) vs only
>> one of the two being off (where presently we are unable to turn off just
>> DMA remapping). Indeed the original Linux commit, which our change was
>> inspired by, results in merely interrupt remapping getting turned off
>> (afaict), and that hasn't changed. (Would be nice to have this confirmed
>> for the system in question, i.e. without Xen underneath Linux.) It would
>> certainly be possible for us to do so too - it might be a one line change:
>  
> Could you elaborate on that one?

I guess I'd need to know what you're missing; the entire paragraph was
intended more for Andrew and Roger (and others who are interested on the
"development" side) rather than you. Specifically ...

> Should I test something else?

... there was no request for any further testing here, for the moment.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.