[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 09/05/18 16:48, Mirela Simonovic wrote: Hi Julien, Hi Mirela, On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@xxxxxxx> wrote:Hi Mirela, On 27/04/18 18:12, Mirela Simonovic wrote:On boot, enabling errata workarounds will be triggered by the boot CPU from start_xen(). On CPU hotplug (non-boot scenario) this would not be done. This patch adds the code required to enable errata workarounds for a CPU being hotplugged after the system boots. This is triggered using a notifier. If the CPU fails to enable the errata Xen will panic. This is done because it is assumed that the CPU which is hotplugged after the system/Xen boots, was initially hotplugged during the system/Xen boot. Therefore, enabling errata workarounds should never fail. Signed-off-by: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx> --- CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> CC: Julien Grall <julien.grall@xxxxxxx> --- xen/arch/arm/cpuerrata.c | 35 +++++++++++++++++++++++++++++++++++ xen/arch/arm/cpufeature.c | 23 +++++++++++++++++++++++ xen/include/asm-arm/cpufeature.h | 1 + 3 files changed, 59 insertions(+) diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c index 1baa20654b..4040f781ec 100644 --- a/xen/arch/arm/cpuerrata.c +++ b/xen/arch/arm/cpuerrata.c @@ -5,6 +5,8 @@ #include <xen/spinlock.h> #include <xen/vmap.h> #include <xen/warning.h> +#include <xen/notifier.h> +#include <xen/cpu.h> #include <asm/cpufeature.h> #include <asm/cpuerrata.h> #include <asm/psci.h> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) enable_cpu_capabilities(arm_errata); } +static int cpu_errata_callback( + struct notifier_block *nfb, unsigned long action, void *hcpu) +{ + switch ( action ) + { + case CPU_STARTING: + enable_nonboot_cpu_caps(arm_errata); + break; + default: + break; + } + + return NOTIFY_DONE; +} + +static struct notifier_block cpu_errata_nfb = { + .notifier_call = cpu_errata_callback, +}; + +static int __init cpu_errata_notifier_init(void) +{ + register_cpu_notifier(&cpu_errata_nfb); + return 0; +} +/* + * Initialization has to be done at init rather than presmp_init phase because + * the callback should execute only after the secondary CPUs are initially + * booted (in hotplug scenarios when the system state is not boot). On boot, + * the enabling of errata workarounds will be triggered by the boot CPU from + * start_xen(). + */ +__initcall(cpu_errata_notifier_init); + /* * Local variables: * mode: C diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c index 525b45e22f..dd30f0d29c 100644 --- a/xen/arch/arm/cpufeature.c +++ b/xen/arch/arm/cpufeature.c @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps) } } +/* Run through the enabled capabilities and enable() them on the calling CPU */ +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) +{ + ASSERT(system_state != SYS_STATE_boot); + + for ( ; caps->matches; caps++ ) + { + if ( !cpus_have_cap(caps->capability) ) + continue; + + if ( caps->enable ) + { + /* + * Since the CPU has enabled errata workarounds on boot, it shouldThis function is not really about errata, it is about capabilities. Errata is just a sub-category of them.I've fixed the comment, thanks.+ * never fail to enable them here. + */ + if ( caps->enable((void *)caps) ) + panic("CPU%u failed to enable capability %u\n", + smp_processor_id(), caps->capability);We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it would only affect the current CPU. So it would be better to return an error and let the caller decide what to do.I need to emphasize two points: 1) I don't see how is this different compared to PSCI CPU OFF where we do panic. Essentially, in both cases the system will not be able to use that CPU and we already agreed that is a good reason to panic. You can't compare PSCI CPU off and the enable callback failing. The *only* reason PSCI CPU off can fail is because the Trusted OS is resident on that CPU. If that ever happen it is a programming error on Xen, and it makes sense to fail because you don't want that CPU to spin in Xen. Enabling a capability can fail because of a failure of allocating memory or mapping (see spectre workaround). It is not a programming error but an expected behavior and it is not a valid reason to assume we want to kill the system. While I agree that big.LITTLE is out of scope of your series, what I ask has nothing to do with big.LITTLE. There are valid reason for the enable callback to fail whether it is the case today or not.As oppose to CPU_OFF which wasn't called on boot so we indeed have no idea whether it will pass on suspend, no matter how unlikely it could fail, in this scenario we are sure that enabling capability should pass because it already passed on boot. So if it doesn't pass, which I consider to be impossible, I believe we should panic. On the other hand, I understand how would this make a difference on big.LITTLE where you try to hotplug a CPU that was never booted. However, that scenario is out of this scope. 2) I still wanted to give a chance to your proposal and just convert panic into stop_cpu+printing error. The system cannot survive if enabling a capability fails. In order to test this I added a capability that will always fail after the boot. This is not realistic in my opinion, but I used it only for testing to check whether the system will survive. Instead of panic I printed an error and stopped the CPU. However, Xen crashed. The boot CPU properly concluded that the erroneous CPU will never become online, but later on credit scheduler's assertion fails. Please provide more details. I believe this is something that a person who adds big.LITTLE support should deal with. If there is a bug in the scheduler it should be fixed rather trying to workaround with a panic in the code. If you provide more details, we might be able to help here. Do we have an agreement to keep panic? I am afraid not, panic (and BUG*) should only be used when there are no way to come back or it is a programming error to end up here. I don't think this is the case with the information I have in hand. The two solutions I find acceptable would be: 1) Log a warning and ignore the error. Likely your CPU will break later on.2) Return an error and let the caller deal with it. The caller might decide to kill the system, but that's not our business. This function should only report an error. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |