[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU



On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
> On 10.10.2019 17:19, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>>>
> >>>>>> Perhaps "unmasked" instead of "the"?
> >>>>>>
> >>>>>>> IO-APIC entries also need to be translated to the new format and added
> >>>>>>> to the interrupt remapping table.
> >>>>>>>
> >>>>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>>>> hardware that has unmasked IO-APIC pins.
> >>>>>>
> >>>>>> But in the end it only papers over whatever the spurious interrupts
> >>>>>> result form, doesn't it? Which isn't to say this isn't an
> >>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>>>> well, as would be pointing out that this case still won't work on
> >>>>>> AMD IOMMUs.
> >>>>>
> >>>>> But the fix for the ExtINT AMD issue should be done in
> >>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>>>> ExtINT delivery mode, not to this part of the code. I will look
> >>>>> into it, but I think it's kind of tangential to the issue here.
> >>>>
> >>>> I'm not talking of you working on fixing this right away. I'm merely
> >>>> asking that you mention here (a) the ExtInt special case and (b)
> >>>> that this special case will (continue to) not work in the AMD case.
> >>>>
> >>>>>>> --- a/xen/arch/x86/apic.c
> >>>>>>> +++ b/xen/arch/x86/apic.c
> >>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>>>      iommu_enable_x2apic();
> >>>>>>>      __enable_x2apic();
> >>>>>>>  
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>>>      unmask_8259A();
> >>>>>>>  
> >>>>>>>  out:
> >>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>>>  
> >>>>>>>  restore_out:
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    /*
> >>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu 
> >>>>>>> has been
> >>>>>>> +     * enabled during the process, because the entries need to be 
> >>>>>>> translated
> >>>>>>> +     * and added to the remapping table in that case.
> >>>>>>> +     */
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>>>
> >>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>>>> to pass "false" there, not "true".
> >>>>>
> >>>>> In the resume_x2apic case interrupt remapping should already be
> >>>>> enabled or not, but that function is not going to enable interrupt
> >>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>>>> should already be using the interrupt remapping table and no
> >>>>> translation is needed.
> >>>>
> >>>> Who / what would have enabled the IOMMU in the resume case?
> >>>
> >>> I don't think the question is who enables interrupt remapping in the
> >>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> >>> AFAICT), the point here is that on resume the entries in the IO-APIC
> >>> will already match the state of interrupt remapping, so they shouldn't
> >>> be translated. If interrupt remapping was off before suspend it will
> >>> still be off after resume, and there won't be any translation needed.
> >>> The same is true if interrupt remapping is on before suspend.
> >>
> >> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> >> not prior to suspend.
> > 
> > Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> > able to find a statement about the contents of the IO-APIC registers
> > (and more specifically the entries) when getting back from
> > suspension. Are all entries cleared and masked?
> > 
> > Are the values previous to suspension stored?
> 
> See ioapic_suspend() / ioapic_resume(): Looks like there's some
> redundancy here - I don't think it makes sense for the LAPIC
> code to fiddle with all the RTEs if subsequently (I assume;
> didn't check) they'll all be overwritten anyway. I would seem
> more logical to me if they'd just all get masked for IOMMU
> enabling, deferring to ioapic_resume() for everything else.

Yes, it seems like resume_x2apic shouldn't need to play with the
entries at all TBH, just enabling interrupt remapping should be enough
given that the entries are already save and restored by
ioapic_{suspend/resume}.

Would you agree to leave that as-is in this patch, ie: always pass
true to restore_IO_APIC_setup in resume_x2apic?

> >>>>>> Also how would "x2apic_enabled" reflect the transition? It may
> >>>>>> have been "true" already before entering the function here.
> >>>>>
> >>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
> >>>>> is called interrupt remapping must be enabled, because AFAICT at this
> >>>>> point it's not possible to have x2apic_enabled == true and interrupt
> >>>>> remapping disabled.
> >>>>>
> >>>>> The issue I can see here is what happens if interrupt remapping was
> >>>>> already enabled by the hardware, and entries in the IO-APIC are
> >>>>> already using the remapping table. I would have to look into how to
> >>>>> detect that case properly, but I think the proposed change is an
> >>>>> improvement overall.
> >>>>
> >>>> Firmware handing off with the IOMMU (and hence interrupt remapping)
> >>>> already enabled is a case which - afaict - we can't currently cope
> >>>> with. Firmware handing off in x2APIC enabled state is typically
> >>>> with the IOMMU (and hence interrupt remapping) still disabled. This
> >>>> is not a forbidden mode, it's just that in such a configuration
> >>>> interrupts can't be delivered to certain CPUs.
> >>>>
> >>>> In any event you need to properly distinguish x2APIC enabled state
> >>>> (or the transition thereof) from IOMMU / interrupt remapping
> >>>> enabled state (or the transition thereof). I.e. you want to avoid
> >>>> raw mode restore if interrupt remapping state transitioned from
> >>>> off to on in the process.
> >>>
> >>> Right, and that's why the call to restore_IO_APIC_setup in
> >>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
> >>> interrupt remapping has been enabled by the call to
> >>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
> >>> in the IO-APIC need to be translated to use the interrupt remapping
> >>> table. There's no path that can lead to restore_IO_APIC_setup with
> >>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
> >>> enabled and interrupt remapping disabled).
> >>>
> >>> Hence if interrupt remapping is off before calling x2apic_bsp_setup
> >>> (which is what Xen expects to function properly) and x2apic_enabled ==
> >>> true when calling restore_IO_APIC_setup it means interrupt remapping
> >>> got enabled, and the IO-APIC entries need translation.
> >>
> >> But the code in question sits on a shared success/error path, and
> >> in the error case it matters whether x2apic_enabled was true already
> >> on entry.
> > 
> > But it's simply not possible to reach the call to
> > restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> > remapping disabled, regardless of the initial value of
> > x2apic_enabled.
> > 
> > All the paths that could lead to this scenario are short-circuited
> > above with a panic.
> 
> Hmm, true. Nevertheless it would feel better if the conditionals
> were using what actually matters, rather than something derived.

Would you be OK to using something like v1:

https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html

See the usage of iommu_enabled in x2apic_bsp_setup. I think using
x2apic_enabled is fine given the logic in the function, and the fact
that x2APIC mandates interrupt remapping, but I could live with that
extra local variable.

> >> I realize that io_apic_write() would suitably avoid going
> >> the remapping path, but I think it would be more clear if the
> >> distinction was already made properly at the call site.
> > 
> > I'm afraid I'm slightly loss, do you mean to replace the
> > ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
> 
> No, because of ...
> 
> > That would be the same as always passing raw == false AFAICT.
> 
> ... this. I'm asking to pass in the argument for "raw" based
> on what you want, without relying on io_apic_write()'s
> behavior. That's more for code clarity than actual correctness,
> since - as said - io_apic_write() would invoke __io_apic_write()
> anyway.

Sorry, but I'm afraid I still don't fully understand what you mean
with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
uses __io_apic_write or io_apic_write. I could get rid of the usage of
ioapic_write_entry and just call __io_apic_write or io_apic_write (or
even directly call iommu_update_ire_from_apic), but I'm not sure
what's the benefit of any of this, it's just open-coding logic that's
done in the helpers.

Would you rather mean to rename the parameter of restore_IO_APIC_setup
from 'raw' to 'translate' or some such in order to clearly denote
there's a transformation done before writing the entry?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.