[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 for 4.23] Add GICv3 SGI boot/self tests in Xen

To: Ayan Kumar Halder <ayan.kumar.halder@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Julien Grall <julien@xxxxxxx>
Date: Fri, 19 Jun 2026 22:12:05 +0100
Cc: Doug Goldstein <cardoe@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Delivery-date: Fri, 19 Jun 2026 21:12:18 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Ayan,

On 29/05/2026 18:09, Ayan Kumar Halder wrote:

Boot self-tests (also referred to as boot-time tests or power-on
self-tests) are intended to validate internal features of Xen during
bring-up. They are meant to be run in a debug / validation environment;
Xen is not expected to remain functional for production use after the
self-tests have executed.


Looking at the code below, isn't Xen functional even after the self-test?

The purpose of these tests is to catch
hardware configuration issues early and to confirm that the platform
on which Xen has been brought up is sane. The expected flow is:
build Xen with the self-tests enabled, boot it, inspect the results,
and then reboot into the usual production configuration.

Introduce the tests to confirm that:
1. A cpu can send SGI 0 to itself
2. A cpu can send SGI 0 to another specific CPU
3. A cpu can send SGI 0 to all the other CPUs
4. A cpu can send SGI 1 to another CPU

I am not sure what you meant by SGI 0 and SGI 1? Below you seem to beuse only a SGI (which is neither 0 or 1) except for one specific test:cpu 0 injecting an SGI to a specific CPU (0).


These tests aim to test Xen has configured the GIC correctly to use SGIs.
Thus, the tests invoke specific APIs of GIC driver.

Also, introduce a config CONFIG_BOOT_SELFTEST which enables these tests.
The option defaults to N; it should be disabled for production builds and
is intended for the validation pipeline and coverage measurement. The
tests run during Xen boot and validate internal interfaces such as Xen's
interface with hardware, firmware and the bootloader.

Also, introduce an integer command line parameter "gic-test". By default, it
is set to 0 which means no tests are enabled.
For running SGI tests, "gic-test" should be set to 1. In future if we add
tests for distributer, ITS, LPI, etc, then we can use different numbers.
Thus, each number denotes a functionality of GICv3 which can be tested
independently and within a single boot of Xen.

In this way, we ensure that the tests to validate SGIs do not impact any other
tests.

In order to keep all the boot-time self-tests together in the binary, we
have introduced a separate section "initcallboottest". All the tests are
registered using __initcallboottest. During the bootup of each core, Xen
invokes do_init_boottests() to run the these tests. All these tests are
invoked before Xen creates the domains (in case of primary core) or runs
the idle loop (in case of secondary core).

Note: it was suggested that, once the boot self-tests have run, Xen
should call machine_halt() rather than continue booting (since this
build is only intended for validation). This is not wired in here
because the SGIs are sent from the primary and secondary CPUs and
received asynchronously on the target CPUs. There is no definite point
in the boot flow at which Xen can know that every send has been
observed by its receiver, so "after the tests have completed" has no
well-defined moment at which to insert machine_halt().

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@xxxxxxx>
Signed-off-by: Michal Orzel <michal.orzel@xxxxxxx>
---
Link to v1 (RFC):
https://lists.xenproject.org/archives/html/xen-devel/2025-09/msg00956.html

Upstream CI run (xen-project/people/ayankuma/xen fork, one commit on
top of xen-project/xen staging — all Linux builds + tests including
qemu-smoke-boot-selftest-arm64-gcc-debug passed; only the macos jobs
sit pending because the personal fork has no macos runner):
https://gitlab.com/xen-project/people/ayankuma/xen/-/pipelines/2561806695

Changes in v2:
  - Renamed the patch from "xen/arm: Introduce GICV3 Self Tests" to
    "Add GICv3 SGI boot/self tests in Xen", and rewrote the commit
    message to explain the intent of boot self-tests (debug /
    validation builds only, Xen not expected to remain functional
    afterwards).
  - Moved the selftest code out of gic-v3.c into a dedicated file
    xen/arch/arm/gic-test.c, gated by CONFIG_BOOT_SELFTEST
    (Stefano, Grygorii).
  - Introduced a generic boot-self-test framework: new section
    "initcallboottest", registration macro __initcallboottest, and
    do_init_boottests() invoked once per CPU after
    local_irq_enable(), so the test runs on every CPU (boot +
    secondaries) and no longer collides with the IRQ-enable timing
    in gicv3_init() (Julien #1, Julien #3).
  - Added Kconfig option CONFIG_BOOT_SELFTEST in
    xen/arch/arm/Kconfig (arm-only for now; arch-specific because
    the only registered test is GICv3-specific).
  - Reserved a dedicated SGI value GIC_SGI_TEST in enum gic_sgi
    (xen/arch/arm/include/asm/gic.h), so the selftest never
    reuses a functional SGI (Grygorii #3).
  - Added a runtime integer command-line parameter "gic-test" so
    the selftest binary can be shipped but its execution selected
    at boot (gic-test=0 -> no-op; gic-test=1 -> SGI tests). Future
    GICv3 features (distributor, ITS, LPI, ...) can claim further
    values (Grygorii #2, partial).
  - Documented why machine_halt() is not invoked after the tests:
    SGI delivery is asynchronous, so there is no well-defined
    point after which every send has been observed by its
    receiver (Julien #2).
  - Wired the tests into upstream GitLab CI: new build job
    alpine-3.18-gcc-debug-arm64-boot-selftest, new test job
    qemu-smoke-boot-selftest-arm64-gcc-debug, and the runner
    script automation/scripts/qemu-boot-selftest-arm64.sh that
    dumps the QEMU virt DTB, injects
    "gic-test=1 console=dtuart sync_console" into
    /chosen/xen,xen-bootargs via fdtput, boots Xen, and checks
    for each "Sending GIC_SGI_TEST ..." followed by the matching
    "CPU%u: GIC_SGI_TEST received".

  automation/gitlab-ci/build.yaml               |  8 ++
  automation/gitlab-ci/test.yaml                |  8 ++
  .../scripts/qemu-boot-selftest-arm64.sh       | 81 +++++++++++++++++++
  xen/arch/arm/Kconfig                          | 15 ++++
  xen/arch/arm/Makefile                         |  1 +
  xen/arch/arm/gic-test.c                       | 52 ++++++++++++
  xen/arch/arm/gic.c                            |  5 ++
  xen/arch/arm/include/asm/gic.h                |  3 +
  xen/arch/arm/setup.c                          |  2 +
  xen/arch/arm/smpboot.c                        |  2 +
  xen/arch/arm/xen.lds.S                        |  4 +
  xen/common/kernel.c                           | 11 +++
  xen/include/xen/init.h                        |  3 +
  13 files changed, 195 insertions(+)
  create mode 100755 automation/scripts/qemu-boot-selftest-arm64.sh
  create mode 100644 xen/arch/arm/gic-test.c

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 7f5b5938e8..8df45caa86 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -439,6 +439,14 @@ alpine-3.18-gcc-debug-arm64:
        CONFIG_UBSAN=y
        CONFIG_UBSAN_FATAL=y

+alpine-3.18-gcc-debug-arm64-boot-selftest:

+  extends: .gcc-arm64-build-debug
+  <<: *build-test
+  variables:
+    CONTAINER: alpine:3.18-arm64v8
+    EXTRA_XEN_CONFIG: |
+      CONFIG_BOOT_SELFTEST=y
+
  alpine-3.18-gcc-arm64-randconfig:
    extends: .gcc-arm64-build
    variables:
diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 8770c523e2..2398c6299a 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -524,6 +524,14 @@ qemu-smoke-dom0less-arm64-gcc-debug-gicv3:
      - *arm64-test-needs
      - alpine-3.18-gcc-debug-arm64

+qemu-smoke-boot-selftest-arm64-gcc-debug:

+  extends: .qemu-arm64
+  script:
+    - ./automation/scripts/qemu-boot-selftest-arm64.sh 2>&1 | tee ${LOGFILE}
+  needs:
+    - *arm64-test-needs
+    - alpine-3.18-gcc-debug-arm64-boot-selftest
+
  qemu-smoke-dom0less-arm64-gcc-debug-staticmem:
    extends: .qemu-arm64
    script:
diff --git a/automation/scripts/qemu-boot-selftest-arm64.sh 
b/automation/scripts/qemu-boot-selftest-arm64.sh
new file mode 100755
index 0000000000..a37dba3e07
--- /dev/null
+++ b/automation/scripts/qemu-boot-selftest-arm64.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+
+set -ex -o pipefail
+
+# Boot the prebuilt Xen binary under QEMU with CONFIG_BOOT_SELFTEST=y enabled
+# and gic-test=1 in xen,xen-bootargs, then verify the four GICv3 SGI self-tests
+# pass by inspecting the serial log.
+
+XEN=binaries/xen
+QEMU=./binaries/qemu-system-aarch64
+DTB_RAW=binaries/virt.dtb
+DTB=binaries/virt-bootselftest.dtb
+LOG=smoke.serial
+
+test -x ${QEMU}
+test -f ${XEN}
+
+# Dump the auto-generated DT from the QEMU virt machine, then inject
+# /chosen/xen,xen-bootargs.  The selftest infrastructure invokes
+# do_init_boottests() during early boot; gic-test=1 selects the GICv3 SGI
+# tests.
+# -net none avoids QEMU's default virtio-net-pci, whose efi-virtio.rom
+# is not shipped with the qemu-system-aarch64 artifact used in CI.
+${QEMU} \
+    -machine virt,virtualization=true,gic-version=3,dumpdtb=${DTB_RAW} \
+    -cpu cortex-a57 -m 1024 -smp 2 -display none -net none
+
+cp ${DTB_RAW} ${DTB}
+fdtput -t s ${DTB} /chosen xen,xen-bootargs \
+    "gic-test=1 console=dtuart sync_console"
+
+rm -f ${LOG}
+timeout 60 ${QEMU} \
+    -machine virt,virtualization=true,gic-version=3 \
+    -cpu cortex-a57 -m 1024 -smp 2 \

This means that there is no much difference between "send an SGI to aspecific CPU" and "send an SGI to others CPU". I think it would be moremeaningful to use 3 or more pCPUs.

+    -serial file:${LOG} \
+    -monitor none -display none -no-reboot -net none \
+    -dtb ${DTB} \
+    -kernel ${XEN} || true
+
+# Each "Sending GIC_SGI_TEST ..." line must be followed by the matching
+# "CPU%u: GIC_SGI_TEST received".
+fail=0
+check_pair() {
+    local send_pat=$1
+    local recv_pat=$2
+    local send_line recv_line
+
+    send_line=$(grep -n -- "${send_pat}" ${LOG} | head -n1 | cut -d: -f1)
+    if [ -z "${send_line}" ]; then
+        echo "MISSING: ${send_pat}"
+        fail=1
+        return
+    fi
+
+    recv_line=$(grep -n -- "${recv_pat}" ${LOG} \
+        | awk -v bl="${send_line}" -F: '$1 > bl {print $1; exit}')
+    if [ -z "${recv_line}" ]; then
+        echo "MISSING (after line ${send_line}): ${recv_pat}"
+        fail=1
+        return
+    fi
+
+    echo "OK: '${send_pat}' -> '${recv_pat}' (lines ${send_line} -> 
${recv_line})"
+}
+
+# Boot CPU sends SGI to itself
+check_pair "Sending GIC_SGI_TEST to self CPU0" "CPU0: GIC_SGI_TEST received"
+# Secondary CPU sends SGI to itself
+check_pair "Sending GIC_SGI_TEST to self CPU1" "CPU1: GIC_SGI_TEST received"
+# Secondary CPU sends SGI to primary
+check_pair "Sending GIC_SGI_TEST to CPU0 from CPU1" "CPU0: GIC_SGI_TEST 
received"
+# Send to all-but-self
+check_pair "Sending GIC_SGI_TEST to all except CPU1" "CPU0: GIC_SGI_TEST 
received"
+
+if [ ${fail} -ne 0 ]; then
+    echo "FAILED"
+    exit 1
+fi
+
+echo "PASSED"
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 79622b46a1..0e23bbf20b 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -476,6 +476,21 @@ config ARM64_HARDEN_BRANCH_PREDICTOR
  config ARM32_HARDEN_BRANCH_PREDICTOR
      def_bool y if ARM_32 && HARDEN_BRANCH_PREDICTOR

+config BOOT_SELFTEST

+    bool "Enable boot-time self-tests"
+    default n

Above you said, this is not meant for production. So I was expecting tosee a dependency on CONFIG_DEBUG.

If the intention is to use it in release build, given the currentbehavior (e.g. breaking test), I think this should depend on UNSUPPORTEDso we don't get security report because Xen is broken after the boot tests.

> + help> + This option enables boot-time self-tests thatvalidate Xen's internal

+      interfaces with hardware, firmware and the bootloader. The tests are
+      registered with __initcallboottest and executed by do_init_boottests()
+      during early boot, before domains are created.
+
+      These tests are intended for validation and coverage measurement, not
+      for production builds. With this option enabled, Xen may not be
+      functional after the tests have run.

If we know a test break, I think it is best that Xen doesn't continue.Otherwise, it is quite confusing for the user to know what's going on.

My preference is the bootest would act like other self tests and Xen cancontinue booting normally. So the tests could also be meaningful innon-test setups.

+
+      If unsure, say N.
+
  source "arch/arm/platforms/Kconfig"

source "common/Kconfig"

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 84c4062b30..0090761682 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -24,6 +24,7 @@ obj-y += domctl.o
  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
  obj-y += efi/
  obj-y += gic.o
+obj-$(CONFIG_BOOT_SELFTEST) += gic-test.init.o
  obj-$(CONFIG_GICV2) += gic-v2.o
  obj-$(CONFIG_GICV3) += gic-v3.o
  obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
diff --git a/xen/arch/arm/gic-test.c b/xen/arch/arm/gic-test.c
new file mode 100644
index 0000000000..ca922e5d2a
--- /dev/null
+++ b/xen/arch/arm/gic-test.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/delay.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <xen/shutdown.h>


Can you clarify why you need this header?

+#include <asm/gic.h>
+
+/*
+ * gic_test: Specifies the gic test to be executed.
+ * 0 = no tests are executed
+ * 1 = SGI tests are executed
+ */
+static unsigned int __initdata gic_test = 0;
+integer_param("gic-test", gic_test);


Given this is mean to be 0 or 1, why not using "boolean_param"?

Also, new command line option should be documented in the docs. Thatsaid, I am not really sure about

+
+/*
+ * CPU0: GIC_SGI_DUMP_STATE to self
+ * CPU{0-N}: GIC_SGI_TEST to self
+ * CPU{1-N}: GIC_SGI_TEST to CPU0
+ * CPU{N}: GIC_SGI_TEST to all but self
+ */
+static int __init gic_self_sgi_test(void)
+{
+    if ( !gic_test )
+        return 0;
+
+    printk("Sending GIC_SGI_TEST to self CPU%u\n", smp_processor_id());
+    send_SGI_self(GIC_SGI_TEST);
+
+    if ( smp_processor_id() == 0 )
+    {
+        printk("Sending GIC_SGI_DUMP_STATE to CPU0\n");
+        smp_send_state_dump(0);

OOI, why is this only called for CPU0? You also don't seem to check thatsmp_send_state_dump() in the CI test.

+
+        return 0;
+    }
+
+    printk("Sending GIC_SGI_TEST to CPU0 from CPU%u\n", smp_processor_id());
+    send_SGI_one(0, GIC_SGI_TEST);
+
+    /* Execute this test only from the last core */
+    if ( smp_processor_id() == (smp_get_max_cpus() - 1) )

This is relying on how Xen is boot CPUs. Would it be better to check thenumber of online CPUs at the time of the check? (You might need tore-order some code for that)

+    {
+        printk("Sending GIC_SGI_TEST to all except CPU%u\n", 
smp_processor_id());
+        send_SGI_allbutself(GIC_SGI_TEST);
+    }
+
+    return 0;
+
+}
+__initcallboottest(gic_self_sgi_test);
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ee75258fc3..9736b0c7df 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -324,6 +324,11 @@ static void do_static_sgi(struct cpu_user_regs *regs, enum 
gic_sgi sgi)
      case GIC_SGI_CALL_FUNCTION:
          smp_call_function_interrupt();
          break;
+#ifdef CONFIG_BOOT_SELFTEST
+    case GIC_SGI_TEST:
+        printk("CPU%u: GIC_SGI_TEST received\n", smp_processor_id());

To confirm, we will solely rely on logging? IOW, there is no plan tohave Xen self-sufficient (e.g. using a global variable).

+        break;
+#endif
      default:
          panic("Unhandled SGI %d on CPU%d\n", sgi, smp_processor_id());
          break;
diff --git a/xen/arch/arm/include/asm/gic.h b/xen/arch/arm/include/asm/gic.h
index ff22dea40d..74bdd4ff63 100644
--- a/xen/arch/arm/include/asm/gic.h
+++ b/xen/arch/arm/include/asm/gic.h
@@ -306,6 +306,9 @@ enum gic_sgi {
      GIC_SGI_EVENT_CHECK,
      GIC_SGI_DUMP_STATE,
      GIC_SGI_CALL_FUNCTION,
+#ifdef CONFIG_BOOT_SELFTEST
+    GIC_SGI_TEST,
+#endif
      GIC_SGI_STATIC_MAX,
  };

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c

index 6310a47d68..4e5db93027 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -470,6 +470,8 @@ void asmlinkage __init noreturn start_xen(unsigned long 
fdt_paddr)
      enable_errata_workarounds();
      enable_cpu_features();

+ do_init_boottests();

+
      /* Create initial domain 0. */
      if ( !is_dom0less_mode() )
          create_dom0();
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 7f3cfa812e..a016ff00f5 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -405,6 +405,8 @@ void asmlinkage noreturn start_secondary(void)

printk(XENLOG_DEBUG "CPU %u booted.\n", smp_processor_id());+ do_init_boottests();

+
      startup_cpu_idle_loop();
  }

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S

index 2d5f1c516d..14f64a856c 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -146,6 +146,10 @@ SECTIONS
         *(.initcall1.init)
         __initcall_end = .;

+ __initcall_boot_test_start = .;

+       *(.initcallboottest.init)
+       __initcall_boot_test_end = .;
+
         . = ALIGN(4);
         __alt_instructions = .;
         *(.altinstructions)
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index fb45f81399..2047fe2a3f 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -412,6 +412,7 @@ void add_taint(unsigned int taint)

extern const initcall_t __initcall_start[], __presmp_initcall_end[],

      __initcall_end[];
+extern const initcall_t __initcall_boot_test_start[], 
__initcall_boot_test_end[];

void __init do_presmp_initcalls(void)

  {
@@ -427,6 +428,16 @@ void __init do_initcalls(void)
          (*call)();
  }

+void __init do_init_boottests(void)

+{
+#ifdef CONFIG_BOOT_SELFTEST

I think it would be worth printing before and after to indicate thebegin/end of the selftest.

+    const initcall_t *call;
+    for ( call = __initcall_boot_test_start; call < __initcall_boot_test_end;
+          call++ )
+        (*call)();
+#endif
+}
+
  #ifdef CONFIG_HYPFS
  static unsigned int __read_mostly major_version;
  static unsigned int __read_mostly minor_version;
diff --git a/xen/include/xen/init.h b/xen/include/xen/init.h
index 0c921672c1..bd518bcea9 100644
--- a/xen/include/xen/init.h
+++ b/xen/include/xen/init.h
@@ -66,11 +66,14 @@ typedef void (*exitcall_t)(void);
      static const initcall_t __initcall_##fn __init_call("presmp") = (fn)
  #define __initcall(fn) \
      static const initcall_t __initcall_##fn __init_call("1") = (fn)
+#define __initcallboottest(fn) \
+    static const initcall_t __initcall_##fn __init_call("boottest") = (fn)
  #define __exitcall(fn) \
      static exitcall_t __exitcall_##fn __exit_call = fn

void do_presmp_initcalls(void);

  void do_initcalls(void);
+void do_init_boottests(void);

#endif /* __ASSEMBLER__ */


Cheers,


--
Julien Grall

Prev by Date: Re: [PATCH for-4.22?] xen/arm: mask debug exceptions in initial AArch64 guest state
Next by Date: Re: [PATCH v2 for 4.23] Add GICv3 SGI boot/self tests in Xen
Previous by thread: [PATCH for-4.22?] xen/arm: mask debug exceptions in initial AArch64 guest state
Next by thread: Re: [PATCH v2 for 4.23] Add GICv3 SGI boot/self tests in Xen
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.