[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.13 2/2] x86/ioapic: don't use raw entry reads/writes in clear_IO_APIC_pin



On Fri, Nov 08, 2019 at 11:16:26AM +0100, Jan Beulich wrote:
> On 08.11.2019 10:27, Roger Pau Monné  wrote:
> > On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote:
> >> On 07.11.2019 16:46, Roger Pau Monné  wrote:
> >>> On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote:
> >>>> On 07.11.2019 16:06, Roger Pau Monne wrote:
> >>>>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, 
> >>>>> unsigned int pin)
> >>>>>       */
> >>>>>      memset(&entry, 0, sizeof(entry));
> >>>>>      entry.mask = 1;
> >>>>> -    __ioapic_write_entry(apic, pin, true, entry);
> >>>>> +    __ioapic_write_entry(apic, pin, false, entry);
> >>>>
> >>>> I may be able to understand why this one can't use raw mode, but as
> >>>> per above a better overall description is needed.
> >>>
> >>> Yes, this is the one that's actually incorrect, but see my reasoning
> >>> below.
> >>>
> >>>>
> >>>>> -    entry = __ioapic_read_entry(apic, pin, true);
> >>>>> +    entry = __ioapic_read_entry(apic, pin, false);
> >>>>>      if (entry.irr)
> >>>>>          printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n",
> >>>>>                 IO_APIC_ID(apic), pin);
> >>>>
> >>>> This read again shouldn't need conversion, as the IRR bit doesn't
> >>>> get touched (I think) by the interrupt remapping code during the
> >>>> translation it does.
> >>>
> >>> TBH, I think raw mode should only be used by the iommu code in order
> >>> to setup the entries to point to the interrupt remapping table,
> >>> everything else shouldn't be using raw mode. While it's true that some
> >>> of the cases here are safe to use raw mode I would discourage it's
> >>> usage as it can lead to issues, and this is not a performance critical
> >>> path anyway.
> >>
> >> You also should take the other possible perspective - not using
> >> raw mode means going through interrupt remapping logic, which
> >> can (needlessly) trigger errors. I think you want to break the
> >> patch into a necessary and an optional part. The optional part
> >> should be discussed separately and deferred until after 4.13.
> > 
> > IMO generic IO-APIC code has not business playing with raw entries
> > when interrupt remapping is enabled, the layout of IO-APIC entries in
> > that case is vendor-specific, and hence the generic IO-APIC code is
> > not able to parse it.
> > 
> > For example the code in clear_IO_APIC_pin modifies the mask or the
> > trigger fields of RAW entries, is there any guarantee that those
> > fields don't have different meanings/layout when interrupt remapping
> > is enabled?
> 
> From an abstract pov there's no such guarantee, but in practice
> the meaning of the fields doesn't change. You make a good point
> though nevertheless: For VT-d the trigger mode fields in RTE and
> IRTE need to match, so the interrupt remapping code needs to see
> the trigger mode change. See below for a possible alternative
> patch.
> 
> > I can split the specific bugfix into a separate patch, but IMO the
> > code in clear_IO_APIC_pin is not safe.
> 
> A change is needed, yes, but in particular because of the use of
> the function from clear_IO_APIC(), in turn called from
> disable_IO_APIC(), yet in turn used e.g. during emergency
> shutdown after a crash, I'd like the function to do as simple
> operations as possible, i.e. specifically avoid going through
> interrupt remapping code (because its data structures may also
> be corrupted at that point) unless really needed (hence the
> alternative patch suggestion below).

Isn't just masking the entries fine when disabling the IO-APIC, or a
full wipe of all the entries is required?

I would be fine with having a maskall_IO_APIC function that reads
entries in raw format, sets the mask bit and writes the entry back in
raw format as long as it's annotated that this is done in order to
limit as much a possible the chances of hitting corrupted data in the
crash case.

> As an aside, iommu_crash_shutdown() - even if actually doing
> something, i.e. disabling interrupt remapping - does _not_
> cause the RTEs to be re-written in non-translated format.

I'm not sure whether that will work, it's possible that some entries
can't be translated because they use x2APIC IDs, and thus don't fit in
a non-translated IO-APIC entry.

> Jan
> 
> --- unstable.orig/xen/arch/x86/io_apic.c
> +++ unstable/xen/arch/x86/io_apic.c
> @@ -519,8 +519,9 @@ static void clear_IO_APIC_pin(unsigned i
>      if (entry.irr) {
>          /* Make sure the trigger mode is set to level. */
>          if (!entry.trigger) {
> +            entry = __ioapic_read_entry(apic, pin, false);
>              entry.trigger = 1;
> -            __ioapic_write_entry(apic, pin, TRUE, entry);
> +            __ioapic_write_entry(apic, pin, false, entry);
>          }
>          __io_apic_eoi(apic, entry.vector, pin);
>      }
> @@ -530,7 +531,7 @@ static void clear_IO_APIC_pin(unsigned i
>       */
>      memset(&entry, 0, sizeof(entry));
>      entry.mask = 1;
> -    __ioapic_write_entry(apic, pin, TRUE, entry);
> +    __ioapic_write_entry(apic, pin, false, entry);
>  
>      entry = __ioapic_read_entry(apic, pin, TRUE);
>      if (entry.irr)

Well, this is certainly better than what's there currently, and should
fix the issue reported by Sergey, albeit I still think checking the
irr or the trigger fields of a raw entry when using interrupt
remapping is not safe future wise.

There's no guarantee that future interrupt remapping implementations
don't clobber the non-translated fields with different ones when using
a remapped IO-APIC entry, and hence while this fixes the current issue
at hand it seems fragile.

Anyway, I think this should be fixed ASAP, so if you are happy with
this version that's fine for me. Do you want me to pick this up and
rebase it on top of my TRUE/FALSE removal patch, or would you rather
send it formally standalone?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.