[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOMMU faults after S3


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 2 Apr 2026 16:02:06 +0200
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=fm1 header.d=invisiblethingslab.com header.i="@invisiblethingslab.com" header.h="Cc:Content-Type:Date:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To"; dkim=pass header.s=fm2 header.d=messagingengine.com header.i="@messagingengine.com" header.h="Cc:Content-Type:Date:Feedback-ID:From:In-Reply-To:Message-ID:MIME-Version:References:Subject:To:X-ME-Proxy:X-ME-Sender"
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Thu, 02 Apr 2026 14:02:21 +0000
  • Feedback-id: i1568416f:Fastmail
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Apr 02, 2026 at 12:23:08PM +0200, Jan Beulich wrote:
> On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
> >> On 02.04.2026 10:39, Jan Beulich wrote:
> >>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>>> The xl dmesg output (from MTL this time):
> >>>>
> >>>>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, 
> >>>> cpu_has_apic: 1
> >>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: 
> >>>> cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>
> >>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
> >>> separately. As per xen-cpuid output (below) ARAT is available.
> >>
> >> For this you may want to also add logging to intel_init_arat(): Since 
> >> opt_arat
> >> can be false only due to command line option use, it can only be the 
> >> function
> >> not being called (which looks impossible on plain staging code), or 
> >> cpu_has_arat
> >> being false despite the xen-cpuid output that you supplied earlier 
> >> (inexplicable
> >> as well, at least for now).
> > 
> > Hm, I got this:
> > 
> >     (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
> > 
> > so, cpu_has_arat=0 ...
> > next lines are those, to hint when it happened in the boot process:
> > 
> >     (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
> >     (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
> >     (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2
> > 
> > Looks like calculate_host_policy() runs much later...
> 
> Hmm, yes, and that's the problem. The reason I don't see this is that a newer
> version of [1] has this
>
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>       }
>  
>       /* Now the feature flags better reflect actual CPU features! */
> +     if (c == &boot_cpu_data)
> +             calculate_host_policy();
>  
>       xstate_init(c);
>  
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
>      /* Was already added by probe_cpuid_faulting() */
>  }
>  
> -static void __init calculate_host_policy(void)
> +void __init calculate_host_policy(void)
>  {
>      struct cpu_policy *p = &host_cpu_policy;
>  
> @@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
>  
>  void __init init_guest_cpu_policies(void)
>  {
> +    /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses. 
> */
>      calculate_host_policy();
>  
>      if ( IS_ENABLED(CONFIG_PV) )
> 
> and of course I'm doing my work (and my analysis) with that in place.

FWIW, with this patch applied I get:
(XEN) [18446743899.051851] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743899.051865] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, 
boot_cpu_has(X86_FEATURE_XEN_ARAT): 1

And no IOMMU faults anymore.

> I may need to break this out and submit independently, but really the problem
> here is that the containing series has been sitting largely unreviewed (and
> hence not in a position to plausibly re-post) for almost 5 years. Andrew,
> (maybe also Roger) - I'm open to suggestions how to proceed. When your xstate
> cleanup patches were helped to go in ahead of mine, you promised to help mine
> going in afterwards. Yet nothing has happened (and I'm tired of re-submitting
> large pieces of work just for the sake of re-submitting, i.e. without having
> has [sufficient] feedback on the earlier version).
> 
> Jan
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2021-04/msg01336.html

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.