[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask



On 18.02.2020 17:18, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 04:40:29PM +0100, Jan Beulich wrote:
>> On 18.02.2020 16:34, Andrew Cooper wrote:
>>> On 18/02/2020 14:43, Roger Pau Monné wrote:
>>>> OK, so what about:
>>>>
>>>> if ( in_mc() || in_nmi() )
>>>> {
>>>>     bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
>>>>     unsigned int icr2;
>>>>
>>>>     /*
>>>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>      * because we have no way to avoid reentry, so do not use the APIC
>>>>      * shorthand. The only IPI that should be sent from such context
>>>>      * is a #NMI to shutdown the system in case of a crash.
>>>>      */
>>>>     ASSERT(vector == APIC_DM_NMI);
>>>>     if ( !x2apic )
>>>>         icr2 = apic_read(APIC_ICR2);
>>>>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>     if ( !x2apic )
>>>>         apic_write(APIC_ICR2, icr2);
>>>>
>>>>     return;
>>>> }
>>>>
>>>> I'm unsure as to whether the assert is actually helpful, but would
>>>> like to settle this before sending a new version.
>>>
>>> I can only repeat my previous email (questions and statements).
>>>
>>> *Any* logic inside "if ( in_mc() || in_nmi() )" can't be tested
>>> usefully, making it problematic as a sanity check.
> 
> Right, so what about keeping the logic in "if ( in_mc() || in_nmi() )"
> using the code as it was previous to introducing the shorthand, ie:
> 
> if ( in_mc() || in_nmi() )
> {
>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>     return;
> }
> 
> That would be exactly what send_IPI_mask would do prior to the
> introduction of the shorthand (pre 5500d265a2a8f), and I think
> it's a compromise between "don't do anything" and "let's try to handle
> this in a non-broken way".
> 
> Using the shorthand adds more logic, which we would like to avoid in
> such critical crash paths, so let's try to avoid as much as possible
> by just falling back to what was there previously.
> 
>>> (For this version of the code specifically, you absolutely don't want to
>>> be reading MSR_APIC_BASE every time, and when we're on the crash path
>>> sending NMIs, we don't care at all about clobbering ICR2.)
>>>
>>> Doing nothing, is less bad than doing this.  There is no point trying to
>>> cope with a corner case we don't support, and there is nothing you can
>>> do, sanity wise, which doesn't come with a high chance of blowing up
>>> first in a customer environment.
>>>
>>> Literally, do nothing.  It is the least bad option going.
>>
>> I think you're a little too focused on the crash path. Doing nothing
>> here likely means having problems later if we get into here, in a
>> far harder to debug manner. May I suggest we introduce e.g.
>> SYS_STATE_crashed, and bypass any such potentially problematic
>> checks if in this state? Your argument about not being able to test
>> these paths applies to a "don't do anything" approach as well, after
>> all - we won't know if the absence of any extra logic is fine until
>> someone (perhaps even multiple "someone"-s) actually hit that path.
> 
> Introducing such state would be another option (or a further
> improvement), but we still need to handle what happens when
> send_IPI_mask gets called from non-maskable interrupt context, because
> using the per-CPU mask in that context is definitely not safe
> (regardless of whether it's a crash path or not).
> 
> Falling back to not using the shorthand in such contexts seems like a
> good compromise: it's not adding new logic, just restoring the logic
> prior to the introduction of the shorthand.

I'd be okay with this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.