[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/boot: Make alternative patching NMI-safe



On 05/02/18 14:09, Jan Beulich wrote:
>>>> On 05.02.18 at 11:24, <andrew.cooper3@xxxxxxxxxx> wrote:
>> During patching, there is a very slim risk that an NMI or MCE interrupt in 
>> the
>> middle of altering the code in the NMI/MCE paths, in which case bad things
>> will happen.
>>
>> The NMI risk can be eliminated by running the patching loop in NMI context, 
>> at
>> which point the CPU will defer further NMIs until patching is complete.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> So you continue to think that the risk of hitting an #MC here is
> acceptable, despite there being a solution to the problem. To be
> honest, I find this a little strange. (I do agree that there's no
> good solution to the similar live patching problem.)

The risk is already sufficiently tiny that in 3 years, it hasn't been
observed, nor do I think it is likely to be observed in the future.  At
this point on boot, there is nothing interesting set up, which further
reduces the risk of an MCE.

Furthermore, whether or not Xen survives the MCE (and don't believe I've
ever seen Xen successfully recover from an MCE), the hardware is faulty
and needs replacing (modulo cosmic rays, but the chances of those really
are astronomical).

Irrespective of that, there is no way I'm aware of for generating MCEs
on demand, and therefore, no way of testing the logic.  For that reason
alone, I don't think it is wise to be taking complicated invasive logic.

>
>> @@ -232,13 +254,14 @@ void __init alternative_instructions(void)
>>       */
>>      ASSERT(!local_irq_is_enabled());
>>  
>> -    /* Disable WP to allow application of alternatives to read-only pages. 
>> */
>> -    write_cr0(cr0 & ~X86_CR0_WP);
>> -
>> -    apply_alternatives(__alt_instructions, __alt_instructions_end);
>> +    /*
>> +     * As soon as the callback is set up, the next NMI will trigger 
>> patching,
>> +     * even an NMI ahead of our explicit self-NMI.
>> +     */
>> +    saved_nmi_callback = set_nmi_callback(nmi_apply_alternatives);
>>  
>> -    /* Reinstate WP. */
>> -    write_cr0(cr0);
>> +    /* Send ourselves an NMI to trigger the callback. */
>> +    self_nmi();
>>  
>>      set_nmi_callback(saved_nmi_callback);
>>  }
> Hmm, you restore the old callback and return without having waited
> for the patching to actually occur? Or are you implying that the
> delivery of the NMI is guaranteed to be fully synchronous on all
> hardware? I'm not aware of the LAPIC spec saying anything like this.

I hadn't even considered that.  In practice, NMIs always appear
synchronously following the ICR write.

I expect this probably wasn't the case for off-chip APICs.

In this case, I can move the once boolean to being at file scope and
check for that.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.