[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag


  • To: Teddy Astie <teddy.astie@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Alejandro Vallejo <alejandro.garciavallejo@xxxxxxx>
  • Date: Tue, 11 Nov 2025 13:31:11 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vates.tech smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1JixLRuDbhpo601JIIBwobSidqDWDYIIxUEqG5Wma9Y=; b=qXydFMibpHzIDbG2tvlej9lpeO19RrWcQANMdiDnHIx11hDAO5JmTHRQfOo6x9x7jSJqZfS6OAUllgjWF3YSWQHtvaYYnM9D7YjCDqO8lJV4/KMNpj+ZO/voD9J9dwP1KkHNeCQMnxVbGp11KKR6O6TA7EIziFpezQG8G37bkPkJf7y6N7iyUxv5HL3kCeFACW9qeJmZO0k9+AN1p+1TvjN+1lS7lOfFU0clverkuVU9bwBSQZDtWwbE9XmXqITn0MIQ+uUNuebnB+BSVTcNI4/ZEP6te/VLESipbvHsQff72QsgnJIre9LTaiGkwMPuG3MlOfRqDqT49/u2fbGQoQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CR41mtm9eJz7qaK2QQIsmc3qpzt5ybMRH8lq+YeVxd0Pk3u4327/z/5AAl5XKbj++/PJt8WEvlSDxszoDrekEqitFu3KmmQpz+HFhCIkrDOSkfVaw7aDD6kFNqsREf8ie1mh0ufH/Qs2Ro4V+MRuyvHJ4EeSfNM9R3EDV60L7+qVMHfXGTX/7WMxiVs4n3k16HtgQaewWzRjVNC+RdDGU2iO/DcoATAQPIqPQ2vQ/II0DjNsB2OrsdPoGVQY0bkrc1QafRxmKms4cBpJSv/E/jBhsWL1gj1wDAfhohpgaNaCmeG9B+UdV7Jiconrx/v/Y5G8dPTwl3z6XvctT59nGQ==
  • Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 11 Nov 2025 12:31:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi,

On Wed Oct 29, 2025 at 7:26 PM CET, Teddy Astie wrote:
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.

I don't think you can really do this on AMD without advertising it somehow.

And there's no architectural way to do so.

>
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.

Yes, I can see this being usable and a good idea on Intel hardware.

>
> Signed-off-by: Teddy Astie <teddy.astie@xxxxxxxxxx>
> ---
> This feature can be useful for various reasons, starting with SEV as
> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> boot unless x2APIC is initially enabled, even if it switches to
> x2apic afterward). It could also be interesting to reduce the attack
> surface of the hypervisor (by only exposing x2apic to the guest).

On AMD (again, AFAIK) you do have to implement xAPIC support to provide a true
AMD-like system. Anything else would be a Xen-specific extension.

The intended way to go around trap-and-emulate for xAPIC access is to bite the
bullet and implement accelerated AVIC. That has explicit provisions to enable
SEV operation and would have the neat benefit of elliding certain VMEXITs (i.e:
EOI). It'd also simplify MSI delivery on non-oversubscribed CPUs.

I assume you already looked at it and concluded it was more work than you could
afford, but thought I'd bring it up anyway.

>
> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> be enough for avoiding the problematic cases of virtualized INVLPGB
> (when we have it).
>
> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> freezes for some reason, NetBSD doesn't support it (no x2apic support
> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> with xAPIC.
>
> On Intel platforms, it would be better to expose the
> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> guest, but it's non-trivial as it needs to be properly exposed
> through IA32_ARCH_CAPABILITIES which is currently passed-through.

ARCH_CAPS is part of the CPU policy. You can have toolstack set the bit and
have Xen take the hint. Then it'd also be sent on the migrate stream.

Granted, that wouldn't help you on AMD hardware, but it'd be perfectly
spec-compliant on Intel. A different take might be to have a Xen-specific bit
in the hypervisor leaves, mirroring the arch_caps bit.

I think SeaBIOS, OVMF and NetBSD failing to boot gives you a hint that, while
this might be a good idea for some cases, you do need xAPIC for a general
purpose VM. IMO, at least.

>
>  docs/man/xl.cfg.5.pod.in              |  7 +++++++
>  tools/libs/light/libxl_types.idl      |  1 +
>  tools/libs/light/libxl_x86.c          |  4 ++++
>  tools/xl/xl_parse.c                   |  1 +
>  xen/arch/x86/domain.c                 |  2 +-
>  xen/arch/x86/hvm/hvm.c                |  2 ++
>  xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>  xen/arch/x86/include/asm/domain.h     |  2 ++
>  xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>  xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>  10 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index ad1553c5e9..01b41d93c0 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -3198,6 +3198,13 @@ option.
>  
>  If using this option is necessary to fix an issue, please report a bug.
>  
> +=item B<force_x2apic=BOOLEAN>

nit: I'd say "x2apic_only" to show not only that it starts in x2apic mode, but
also that it must stay that way. But tomato-tomahto.

> +
> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
> +it or switching to xAPIC mode.

The "or switching to xAPIC mode" part is redundant. The means to transition to
xAPIC mode is through disabling it.

> +
> +This option is disabled by default.
> +
>  =back
>  
>  =head1 SEE ALSO
> diff --git a/tools/libs/light/libxl_types.idl 
> b/tools/libs/light/libxl_types.idl
> index d64a573ff3..b95278007e 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                 ("arm_sci", libxl_arm_sci),
>                                ])),
>      ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
> +                               ("force_x2apic", libxl_defbool)
>                                ])),
>      # Alternate p2m is not bound to any architecture or guest type, as it is
>      # supported by x86 HVM and ARM support is planned.
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 60d4e8661c..2e0205d2a2 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
>          config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
>  
> +    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
> +        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
> +
>      if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
>              LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
>              return ERROR_FAIL;
> @@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc 
> *gc,
>  {
>      libxl_defbool_setdefault(&b_info->acpi, true);
>      libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
> +    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
>      libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
>  
>      if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index af86d3186d..d84ab7c823 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -3041,6 +3041,7 @@ skip_usbdev:
>                      "If it fixes an issue you are having please report to "
>                      "xen-devel@xxxxxxxxxxxxxxxxxxxx.\n");
>  
> +    xlu_cfg_get_defbool(config, "force_x2apic", 
> &b_info->arch_x86.force_x2apic, 0);
>      xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 19fd86ce88..02f650a614 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct 
> xen_domctl_createdomain *config)
>          return -EINVAL;
>      }
>  
> -    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> +    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | 
> XEN_X86_FORCE_X2APIC) )

As I said, I'd reuse the bit in ARCH_CAPS in the CPU policy. That also means it
can be properly migrated and you wouldn't need an extra boolean in the domain.

>      {
>          dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>                  config->arch.misc_flags);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0c60faa39d..73cbac0f22 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
>      INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
>      INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
>  
> +    d->arch.hvm.force_x2apic = config->arch.misc_flags & 
> XEN_X86_FORCE_X2APIC;
> +
>      rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
>      if ( rc )
>          goto fail;
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 993e972cd7..ae8df70d2e 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>      if ( !has_vlapic(v->domain) )
>          return X86EMUL_EXCEPTION;
>  
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        /*
> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
> +        */
> +        if ( !(val & APIC_BASE_EXTD) )
> +        {
> +            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced 
> on\n");

This is intended behaviour, not a warning. I'd remove the printk.

> +            return X86EMUL_EXCEPTION;
> +        }
> +    }
> +
>      /* Attempting to set reserved bits? */
>      if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
>                   (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
> @@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
>      if ( v->vcpu_id == 0 )
>          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>  
> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
> +        set_x2apic_id(vlapic);
> +    }
> +    else
> +        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +
>      vlapic_do_init(vlapic);
>  }
>  
> diff --git a/xen/arch/x86/include/asm/domain.h 
> b/xen/arch/x86/include/asm/domain.h
> index 5df8c78253..771992d156 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -509,6 +509,8 @@ struct arch_domain
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>  #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>  
> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)

This would be a check on the CPU policy instead with my proposed change.

> +
>  #define gdt_ldt_pt_idx(v) \
>        ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
>  #define pv_gdt_ptes(v) \
> diff --git a/xen/arch/x86/include/asm/hvm/domain.h 
> b/xen/arch/x86/include/asm/hvm/domain.h
> index 333501d5f2..b56fa08b73 100644
> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -108,6 +108,9 @@ struct hvm_domain {
>      /* Compatibility setting for a bug in x2APIC LDR */
>      bool bug_x2apic_ldr_vcpu_id;
>  
> +    /* LAPIC is forced in x2APIC mode */
> +    bool force_x2apic;
> +
>      /* hypervisor intercepted msix table */
>      struct list_head       msixtbl_list;
>  
> diff --git a/xen/include/public/arch-x86/xen.h 
> b/xen/include/public/arch-x86/xen.h
> index b99a691706..75aa31d9ed 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -309,11 +309,21 @@ struct xen_arch_domainconfig {
>   * doesn't allow the guest to read or write to the underlying MSR.
>   */
>  #define XEN_X86_MSR_RELAXED (1u << 0)
> +
> +/*
> + * This option forces the LAPIC to be in X2APIC mode (IA32_APIC_BASE.EXTD = 
> 1)
> + * using the same semantics as 
> IA32_XAPIC_DISABLE_STATUS:LEGACY_XAPIC_DISABLED
> + *
> + * Attempts by the guest to clear IA32_APIC_BASE.EXTD (e.g disable X2APIC) 
> will
> + * inject #GP in the guest.
> + */
> +#define XEN_X86_FORCE_X2APIC (1U << 1)
> +
>      uint32_t misc_flags;
>  };
>  
>  /* Max  XEN_X86_* constant. Used for ABI checking. */
> -#define XEN_X86_MISC_FLAGS_MAX XEN_X86_MSR_RELAXED
> +#define XEN_X86_MISC_FLAGS_MAX XEN_X86_FORCE_X2APIC
>  
>  #endif
>  




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.