[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
On 20.10.2021 22:01, Andrew Cooper wrote: > On 20/10/2021 11:36, Jan Beulich wrote: >> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC >> mode (physical vs clustered) depends on iommu_intremap, that variable >> needs to be set to off as soon as we know we can't / won't enable the >> IOMMU, i.e. in particular when >> - parsing of the respective ACPI tables failed, >> - "iommu=off" is in effect, but not "iommu=no-intremap". >> Move the turning off of iommu_intremap from AMD specific code into >> acpi_iommu_init(), accompanying it by clearing of iommu_enable. >> >> Take the opportunity and also skip ACPI table parsing altogether when >> "iommu=off" is in effect anyway. >> >> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> I've deliberately not added a Fixes: tag here, as I'm of the opinion >> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only >> uncovered a pre-existing anomaly. > > I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes > tag should be omitted. That change very concretely regressed booting on > some systems in their pre-existing configuration. > > The commit message needs to spell out a link to d8bd82327b0f, but it's > fine to say "that commit broke it by violating an unexpected ordering > dependency, but isn't really the source of the bug". > >> This particularly applies to the "iommu=off" aspect. > > There should be at least two Fixes tags, but I suspect trying to trace > the history of this mess is not a productive use of time. > >> (This now allows me to remove an item from my TODO >> list: I was meaning to figure out why one of my systems wouldn't come >> up properly with "iommu=off", and I had never thought of combining this >> with "no-intremap".) >> >> Arguably "iommu=off" should turn off subordinate features in common >> IOMMU code, but doing so in parse_iommu_param() would be wrong (as >> there might be multiple "iommu=" to parse). This could be placed in >> iommu_supports_x2apic(), but see the next item. > > I don't think we make any claim or implication that passing the same > option twice works. The problem here is the nested structure of > options, and the variable doing double duty. > >> >> While the change here deals with apic_x2apic_probe() as called from >> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be >> similarly affected. That call occurs before acpi_boot_init(), which is >> what calls acpi_iommu_init(). The ordering in setup.c is in part >> relatively fragile, which is why for the moment I'm still hesitant to >> move the generic_apic_probe() call down. Plus I don't have easy access >> to a suitable system to test this case. Thoughts? > > I've written these thoughts before, but IOMMU handling it a catastrophic > mess. It needs burning to the ground and redoing from scratch. > > We currently have two ways of turning on the IOMMU, depending on what > firmware does, and plenty ways of crashing Xen with cmdline options > which should work, and the legacy xAPIC startup routine is after > interrupts have been enabled, leading to an attempt to rewrite > interrupts in place to remap them. This in particular has lead to XSAs > due to trusting registers which can't be trusted, and the rewrite is > impossible to do safely. > > The correct order is: > 1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC > setting and cmdline arguments > 2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether > we need intremap. Change the LAPIC setting > 3) Configure the IOMMUs. In particular, their interrupt needs to be > after step 2 Leaving aside check_x2apic_preenabled(), all of this is already the case afaict, almost at least: We do acpi_boot_init(), later x2apic_bsp_setup(), and yet later iommu_setup(). The only issue might be inside x2apic_bsp_setup(), where we call iommu_enable_x2apic() before switching to x2APIC mode. Yet we avoid setting up IOMMU interrupts during this early stage. Hence I think, as expressed, that the question really is whether we can safely defer check_x2apic_preenabled() by a little bit. > 4) Start up Xen's general IRQ infrastructure. > > It's a fair chunk of work, but it will vastly simplify the boot logic > and let us delete the infinite flushing loops out of the IOMMU logic, > and we don't need any logic which has to second guess itself based on > what happened earlier on boot. > >> --- a/xen/drivers/passthrough/x86/iommu.c >> +++ b/xen/drivers/passthrough/x86/iommu.c >> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_ >> bool __read_mostly iommu_intpost; >> #endif >> >> +void __init acpi_iommu_init(void) >> +{ >> + if ( iommu_enable ) >> + { >> + int ret = acpi_dmar_init(); >> + >> + if ( ret == -ENODEV ) >> + ret = acpi_ivrs_init(); >> + >> + if ( ret ) >> + iommu_enable = false; >> + } >> + >> + if ( !iommu_enable ) >> + iommu_intremap = iommu_intremap_off; >> +} > > This does fix my issue, so preferably with the Fixes tag reinstated, > > Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Tested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Thanks, but I think there will need to be a v2, as per below (plus possibly the dealing with check_x2apic_preenabled()). > However, I don't think skipping parsing is a sensible move. Intremap is > utterly mandatory if during boot, we discover that our APIC ID is >254, > and iommu=no-intremap must be ignored in this case, or if the MADT says > we have CPUs beyond that limit and the user hasn't specified nr_cpus=1 > or equivalent. Reading this made me realize that the change breaks other behavior. The conditional really needs to be iommu_enable || iommu_intremap - at least AMD code added in support for x2APIC already treats the latter to not be a sub-option of the former (iov_supports_xt(), acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks the latter alone. Overriding "iommu=no-intremap" in case it's unavoidable could then be a later change, not further affecting the function here. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |