[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled



On Tue, Mar 1, 2016 at 4:15 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> Ingo, your feedback appreciated at the end here, regarding quirks.
>
> On Tue, Mar 01, 2016 at 09:00:53AM -0500, Boris Ostrovsky wrote:
>> On 02/29/2016 06:50 PM, Andy Lutomirski wrote:
>> >diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> >index 91ddae732a36..c6ef4da8e4f4 100644
>> >--- a/arch/x86/kernel/cpu/common.c
>> >+++ b/arch/x86/kernel/cpu/common.c
>> >@@ -979,6 +979,31 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>
> Note: Andy's change is on identify_cpu() modification here at the end.
>
>> >  #ifdef CONFIG_NUMA
>> >     numa_add_cpu(smp_processor_id());
>> >  #endif
>> >+
>> >+    /*
>> >+     * ESPFIX is a strange bug.  All real CPUs have it.  Paravirt
>> >+     * systems that run Linux at CPL > 0 may or may not have the
>> >+     * issue, but, even if they have the issue, there's absolutely
>> >+     * nothing we can do about it because we can't use the real IRET
>> >+     * instruction.
>> >+     *
>> >+     * NB: For the time being, only 32-bit kernels support
>> >+     * X86_BUG_ESPFIX as such.  64-bit kernels directly choose
>> >+     * whether to apply espfix using paravirt hooks.  If any
>> >+     * non-paravirt system ever shows up that does *not* have the
>> >+     * ESPFIX issue, we can change this.
>> >+     */
>> >+#ifdef CONFIG_X86_32
>> >+#ifdef CONFIG_PARAVIRT
>> >+    do {
>> >+            extern void native_iret(void);
>> >+            if (pv_cpu_ops.iret == native_iret)
>> >+                    set_cpu_bug(c, X86_BUG_ESPFIX);
>> >+    } while (0);
>> >+#else
>> >+    set_cpu_bug(c, X86_BUG_ESPFIX);
>> >+#endif
>> >+#endif
>> >  }
>> >  /*
>>
>> Alternatively, PV guests can clear X86_BUG_ESPFIX in their init
>> code. E.g in .set_cpu_features op, just like we do for
>> X86_BUG_SYSRET_SS_ATTRS
>
> Andy's proposal works out of identify_cpu() and that covers both boot
> processor and secondary CPUs. The summary is as follows, tracing back in
> time from left to right.
>
>                 --- identify_boot_cpu() --- check_bugs() --- start_kernel()
>                /
> identify_cpu()<
>                \
>                 --- identify_secondary_cpu() --- cpu_up() --- smp_init()
>                     --- kernel_init_freeable() --- kernel_init()
>                         --- rest_init() --- start_kernel()
>
>
> set_cpu_features() is called from both: init_hypervisor_platform()
> during setup_arch() and identify_cpu(). Since it'll be called on
> check_bugs() already on identify_boot_cpu() though I think the
> call from init_hypervisor_platform() seems redundant ?
>
> We assume we just call:
>
> setup_arch() --> init_hypervisor_platform() --> 
> init_hypervisor(&boot_cpu_data)
>
> But the above map on identify_cpu() also shows we call:
>
> start_kernel --> check_bugs() --> identify_boot_cpu() -->
>         identify_cpu() --> init_hypervisor() --> set_cpu_features()
>
>
> void init_hypervisor(struct cpuinfo_x86 *c)
> {
>         if (x86_hyper && x86_hyper->set_cpu_features)
>                 x86_hyper->set_cpu_features(c);
> }
>
> Anyway, since we're consolidating quirks, and since it turns out the other
> quirks are being shifted away from paravirt_enabled() out into another struct
> x86_platform_ops CPU specific quirk, I wonder why not just also replace this
> set_cpu_features() thing as a struct x86_platform_ops quirk CPU callback.
>
>> (although this may require adding struct
>> hypervisor_x86 for lguests. Which I think they should have anyway).
>
> lguest already uses x86_platform, and setting up a per CPU quirk would
> be rather trivial.
>
> CPU feature / CPU quirk...
>
> I've stashed the other quirks into a x86_early_init_platform_quirks(),
> this was to have all quirks handled in one place. We handle differences
> with subarch there. vmware has no subarch though, and it uses its own
> set_cpu_features(). We have a few options I  can think of:
>
>  1) keep this on set_cpu_features() and modify lguest to add a struct 
> hypervisor_x86
>     as boris suggests
>
>  2) move set_cpu_features() as a platform feature / quirk callback and
>     call it on identify_cpu()
>
>  3) Just identify each quirk on struct x86_platform, with a set of defaults
>     set. Then identify_cpu() enables a platform  callback to override
>     defaults, and finally then a shared quirk call is issued to
>     set the different set_cpu_features() or clear them.
>

I think this is severely overcomplicating the issue.

The issue is that IRET is a pile of shit.  It may be quirky, but it
affects *everything*.

On x86_64, the kernel assumes that the "iret" implementation works
around the quirk.  xen_iret doesn't, and that's Xen's problem.

On x86_32, it's inconvenient for native_iret to directly work around
the quirk.  Instead, some other asm code in the exit path sets up the
workaround under the assumption that native_iret is just plain IRET.
It's the responsibility of other IRET implementations to have their
own implementations of the workaround and, again, they don't in
practice and this is Xen and lguest's problem.

So I think the condition we want really is (iret == native_iret), and
that's what my patch does.  So I think we should leave it at that.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.