[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

To: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx>
From: Julien Grall <julien.grall@xxxxxxx>
Date: Wed, 9 May 2018 17:32:12 +0100
Cc: "Edgar E. Iglesias" <edgar.iglesias@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Davorin Mista <dm@xxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Wed, 09 May 2018 16:32:40 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>



On 09/05/18 16:48, Mirela Simonovic wrote:

Hi Julien,


Hi Mirela,

On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@xxxxxxx> wrote:

Hi Mirela,


On 27/04/18 18:12, Mirela Simonovic wrote:


On boot, enabling errata workarounds will be triggered by the boot CPU
from start_xen(). On CPU hotplug (non-boot scenario) this would not be
done. This patch adds the code required to enable errata workarounds
for a CPU being hotplugged after the system boots. This is triggered
using a notifier. If the CPU fails to enable the errata Xen will panic.
This is done because it is assumed that the CPU which is hotplugged
after the system/Xen boots, was initially hotplugged during the
system/Xen boot. Therefore, enabling errata workarounds should never
fail.

Signed-off-by: Mirela Simonovic <mirela.simonovic@xxxxxxxxxx>

---
CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
CC: Julien Grall <julien.grall@xxxxxxx>
---
   xen/arch/arm/cpuerrata.c         | 35
+++++++++++++++++++++++++++++++++++
   xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
   xen/include/asm-arm/cpufeature.h |  1 +
   3 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 1baa20654b..4040f781ec 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -5,6 +5,8 @@
   #include <xen/spinlock.h>
   #include <xen/vmap.h>
   #include <xen/warning.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
   #include <asm/cpufeature.h>
   #include <asm/cpuerrata.h>
   #include <asm/psci.h>
@@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
       enable_cpu_capabilities(arm_errata);
   }
   +static int cpu_errata_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_STARTING:
+        enable_nonboot_cpu_caps(arm_errata);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_errata_nfb = {
+    .notifier_call = cpu_errata_callback,
+};
+
+static int __init cpu_errata_notifier_init(void)
+{
+    register_cpu_notifier(&cpu_errata_nfb);
+    return 0;
+}
+/*
+ * Initialization has to be done at init rather than presmp_init phase
because
+ * the callback should execute only after the secondary CPUs are
initially
+ * booted (in hotplug scenarios when the system state is not boot). On
boot,
+ * the enabling of errata workarounds will be triggered by the boot CPU
from
+ * start_xen().
+ */
+__initcall(cpu_errata_notifier_init);
+
   /*
    * Local variables:
    * mode: C
diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 525b45e22f..dd30f0d29c 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
arm_cpu_capabilities *caps)
       }
   }
   +/* Run through the enabled capabilities and enable() them on the
calling CPU */
+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
+{
+    ASSERT(system_state != SYS_STATE_boot);
+
+    for ( ; caps->matches; caps++ )
+    {
+        if ( !cpus_have_cap(caps->capability) )
+            continue;
+
+        if ( caps->enable )
+        {
+            /*
+             * Since the CPU has enabled errata workarounds on boot, it
should



This function is not really about errata, it is about capabilities. Errata
is just a sub-category of them.


I've fixed the comment, thanks.

+             * never fail to enable them here.
+             */
+            if ( caps->enable((void *)caps) )
+                panic("CPU%u failed to enable capability %u\n",
+                      smp_processor_id(), caps->capability);



We should really avoid to use panic(...) if this is something the system can
survive. In that specific case, it would only affect the current CPU. So it
would be better to return an error and let the caller decide what to do.


I need to emphasize two points:
1) I don't see how is this different compared to PSCI CPU OFF where we
do panic. Essentially, in both cases the system will not be able to
use that CPU and we already agreed that is a good reason to panic.

You can't compare PSCI CPU off and the enable callback failing. The*only* reason PSCI CPU off can fail is because the Trusted OS isresident on that CPU. If that ever happen it is a programming error onXen, and it makes sense to fail because you don't want that CPU to spinin Xen.

Enabling a capability can fail because of a failure of allocating memoryor mapping (see spectre workaround). It is not a programming error butan expected behavior and it is not a valid reason to assume we want tokill the system.

As oppose to CPU_OFF which wasn't called on boot so we indeed have no
idea whether it will pass on suspend, no matter how unlikely it could
fail, in this scenario we are sure that enabling capability should
pass because it already passed on boot. So if it doesn't pass, which I
consider to be impossible, I believe we should panic.
On the other hand, I understand how would this make a difference on
big.LITTLE where you try to hotplug a CPU that was never booted.
However, that scenario is out of this scope.

While I agree that big.LITTLE is out of scope of your series, what I askhas nothing to do with big.LITTLE. There are valid reason for the enablecallback to fail whether it is the case today or not.


2) I still wanted to give a chance to your proposal and just convert
panic into stop_cpu+printing error. The system cannot survive if
enabling a capability fails. In order to test this I added a
capability that will always fail after the boot. This is not realistic
in my opinion, but I used it only for testing to check whether the
system will survive. Instead of panic I printed an error and stopped
the CPU. However, Xen crashed. The boot CPU properly concluded that
the erroneous CPU will never become online, but later on credit
scheduler's assertion fails.


Please provide more details.

I believe this is something that a person
who adds big.LITTLE support should deal with.

If there is a bug in the scheduler it should be fixed rather trying toworkaround with a panic in the code. If you provide more details, wemight be able to help here.


Do we have an agreement to keep panic?

I am afraid not, panic (and BUG*) should only be used when there are noway to come back or it is a programming error to end up here. I don'tthink this is the case with the information I have in hand.


The two solutions I find acceptable would be:
        1) Log a warning and ignore the error. Likely your CPU will break later 
on.

2) Return an error and let the caller deal with it. The caller mightdecide to kill the system, but that's not our business. This functionshould only report an error.


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  - From: Mirela Simonovic

References:
- Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  - From: Mirela Simonovic

Prev by Date: Re: [Xen-devel] [PATCH RFC] libxl: set 1GB MMIO hole for PVH
Next by Date: Re: [Xen-devel] [PATCH v2 3/3] xen: share start flags between PV and PVH
Previous by thread: Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Next by thread: Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.