[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU



On 11.10.2019 11:28, Roger Pau Monné  wrote:
> On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
>> On 10.10.2019 17:19, Roger Pau Monné  wrote:
>>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
>>>> On 10.10.2019 15:12, Roger Pau Monné  wrote:
>>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
>>>>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
>>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
>>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
>>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
>>>>>>>>
>>>>>>>> Perhaps "unmasked" instead of "the"?
>>>>>>>>
>>>>>>>>> IO-APIC entries also need to be translated to the new format and added
>>>>>>>>> to the interrupt remapping table.
>>>>>>>>>
>>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on
>>>>>>>>> hardware that has unmasked IO-APIC pins.
>>>>>>>>
>>>>>>>> But in the end it only papers over whatever the spurious interrupts
>>>>>>>> result form, doesn't it? Which isn't to say this isn't an
>>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
>>>>>>>> well, as would be pointing out that this case still won't work on
>>>>>>>> AMD IOMMUs.
>>>>>>>
>>>>>>> But the fix for the ExtINT AMD issue should be done in
>>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
>>>>>>> ExtINT delivery mode, not to this part of the code. I will look
>>>>>>> into it, but I think it's kind of tangential to the issue here.
>>>>>>
>>>>>> I'm not talking of you working on fixing this right away. I'm merely
>>>>>> asking that you mention here (a) the ExtInt special case and (b)
>>>>>> that this special case will (continue to) not work in the AMD case.
>>>>>>
>>>>>>>>> --- a/xen/arch/x86/apic.c
>>>>>>>>> +++ b/xen/arch/x86/apic.c
>>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
>>>>>>>>>      iommu_enable_x2apic();
>>>>>>>>>      __enable_x2apic();
>>>>>>>>>  
>>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
>>>>>>>>>      unmask_8259A();
>>>>>>>>>  
>>>>>>>>>  out:
>>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
>>>>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
>>>>>>>>>  
>>>>>>>>>  restore_out:
>>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
>>>>>>>>> +    /*
>>>>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu 
>>>>>>>>> has been
>>>>>>>>> +     * enabled during the process, because the entries need to be 
>>>>>>>>> translated
>>>>>>>>> +     * and added to the remapping table in that case.
>>>>>>>>> +     */
>>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
>>>>>>>>
>>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
>>>>>>>> enabled in the course of that as well. I.e. I'd expect you want
>>>>>>>> to pass "false" there, not "true".
>>>>>>>
>>>>>>> In the resume_x2apic case interrupt remapping should already be
>>>>>>> enabled or not, but that function is not going to enable interrupt
>>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
>>>>>>> should already be using the interrupt remapping table and no
>>>>>>> translation is needed.
>>>>>>
>>>>>> Who / what would have enabled the IOMMU in the resume case?
>>>>>
>>>>> I don't think the question is who enables interrupt remapping in the
>>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
>>>>> AFAICT), the point here is that on resume the entries in the IO-APIC
>>>>> will already match the state of interrupt remapping, so they shouldn't
>>>>> be translated. If interrupt remapping was off before suspend it will
>>>>> still be off after resume, and there won't be any translation needed.
>>>>> The same is true if interrupt remapping is on before suspend.
>>>>
>>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
>>>> not prior to suspend.
>>>
>>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be
>>> able to find a statement about the contents of the IO-APIC registers
>>> (and more specifically the entries) when getting back from
>>> suspension. Are all entries cleared and masked?
>>>
>>> Are the values previous to suspension stored?
>>
>> See ioapic_suspend() / ioapic_resume(): Looks like there's some
>> redundancy here - I don't think it makes sense for the LAPIC
>> code to fiddle with all the RTEs if subsequently (I assume;
>> didn't check) they'll all be overwritten anyway. I would seem
>> more logical to me if they'd just all get masked for IOMMU
>> enabling, deferring to ioapic_resume() for everything else.
> 
> Yes, it seems like resume_x2apic shouldn't need to play with the
> entries at all TBH, just enabling interrupt remapping should be enough
> given that the entries are already save and restored by
> ioapic_{suspend/resume}.
> 
> Would you agree to leave that as-is in this patch, ie: always pass
> true to restore_IO_APIC_setup in resume_x2apic?

Properly explained in the description, that's an option. Even better
would of course be to do away with the unnecessary save/restore in
a prereq patch, at which point the question disappears as to what to
pass to the function at that point.

>>> But it's simply not possible to reach the call to
>>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt
>>> remapping disabled, regardless of the initial value of
>>> x2apic_enabled.
>>>
>>> All the paths that could lead to this scenario are short-circuited
>>> above with a panic.
>>
>> Hmm, true. Nevertheless it would feel better if the conditionals
>> were using what actually matters, rather than something derived.
> 
> Would you be OK to using something like v1:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html
> 
> See the usage of iommu_enabled in x2apic_bsp_setup. I think using
> x2apic_enabled is fine given the logic in the function, and the fact
> that x2APIC mandates interrupt remapping, but I could live with that
> extra local variable.

Conceptually this looks better to me. I'd prefer to avoid shadowing
the global "iommu_enabled" though, and set the local variable to
"true" only one we actually enable the IOMMU. Also strictly speaking
you care about the enabling of interrupt remapping, not that of the
IOMMU (which implicitly means DMA remapping). I wonder whether the
code couldn't easily be made cope with IOMMU already being enabled,
assuming that this would lead to {iommu,intremap}_enabled to already
be true on entry. I.e. you'd request a "raw" restore unless
intremap_enabled changed from false to true.

>>>> I realize that io_apic_write() would suitably avoid going
>>>> the remapping path, but I think it would be more clear if the
>>>> distinction was already made properly at the call site.
>>>
>>> I'm afraid I'm slightly loss, do you mean to replace the
>>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
>>
>> No, because of ...
>>
>>> That would be the same as always passing raw == false AFAICT.
>>
>> ... this. I'm asking to pass in the argument for "raw" based
>> on what you want, without relying on io_apic_write()'s
>> behavior. That's more for code clarity than actual correctness,
>> since - as said - io_apic_write() would invoke __io_apic_write()
>> anyway.
> 
> Sorry, but I'm afraid I still don't fully understand what you mean
> with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
> uses __io_apic_write or io_apic_write. I could get rid of the usage of
> ioapic_write_entry and just call __io_apic_write or io_apic_write (or
> even directly call iommu_update_ire_from_apic), but I'm not sure
> what's the benefit of any of this, it's just open-coding logic that's
> done in the helpers.
> 
> Would you rather mean to rename the parameter of restore_IO_APIC_setup
> from 'raw' to 'translate' or some such in order to clearly denote
> there's a transformation done before writing the entry?

Whether it's "raw" or "translate" doesn't really matter to me.
What I'm after is for you to not pass raw=true when you really
mean raw=false (or the other way around), relying on the
intremap_enabled check down the call chain in io_apic_write().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.