[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 4/5] x86: Port the basic alternative mechanism from Linux to Xen




> -----Original Message-----
> From: Andrew Cooper [mailto:amc96@xxxxxxxxxxxxxxxx] On Behalf Of
> Andrew Cooper
> Sent: Thursday, May 29, 2014 10:59 PM
> To: Wu, Feng; xen-devel@xxxxxxxxxxxxx
> Cc: keir.xen@xxxxxxxxx; JBeulich@xxxxxxxx; ian.campbell@xxxxxxxxxx;
> stefano.stabellini@xxxxxxxxxx; tim@xxxxxxx
> Subject: Re: [PATCH v2 4/5] x86: Port the basic alternative mechanism from
> Linux to Xen
> 
> On 29/05/2014 10:28, Wu, Feng wrote:
> >
> > 8 @@ void __init noreturn __start_xen(unsigned long
> >> mbi_p)
> >>>        if ( cpu_has_fsgsbase )
> >>>            set_in_cr4(X86_CR4_FSGSBASE);
> >>>
> >>> +    alternative_instructions();
> >>> +
> >> Given this ordering, it might be cleaner to have an
> >> ASSERT(!local_irq_enabled()) in the top of alternative_instructions(),
> >> and forgo the local_irq_save/restore() in text_poke_early().
> >>
> >> If you can move this higher up before enabling MCEs in CR4, it might be
> >> slightly more resilient.
> >>
> >> ~Andrew
> > MCE bit in CR4 is set in identify_cpu() --> mcheck_init() -->
> set_in_cr4(X86_CR4_MCE), but
> > apply_alternatives() needs boot_cpu_data.x86_capability being ready, since 
> > it
> calls boot_cpu_has().
> > If we put alternative_instructions() before enabling MCEs in CR4, which 
> > place
> do you suggest? Thanks!
> >
> > Thanks,
> > Feng
> 
> One option would be to temporarily disable it in cr4 at the same point
> that NMIs are nopped out, in the same way as temporarily disabling
> CR4.SMAP when building dom0.
> 
> At the end of the day, an MCE will certainly result in a crash, but at
> least it wouldn't be from a weird fault because some of the codepath in
> the MCE handler was midway through being patched.
> 
> ~Andrew

From the original comments below, which is from Linux:

        /*
         * Don't stop machine check exceptions while patching.
         * MCEs only happen when something got corrupted and in this
         * case we must do something about the corruption.
         * Ignoring it is worse than a unlikely patching race.
         * Also machine checks tend to be broadcast and if one CPU
         * goes into machine check the others follow quickly, so we don't
         * expect a machine check to cause undue problems during to code
         * patching.
         */

We can see that the MCE remains enabled here on purpose. It says "Ignoring the 
MCE is worse than a _unlikely_ patching race."
Do you think we still need to disable MCE, or should we follow the current 
solution in Linux?

Thanks,
Feng


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.