|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: IOMMU faults after S3
On Thu, Apr 02, 2026 at 12:23:08PM +0200, Jan Beulich wrote:
> On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
> >> On 02.04.2026 10:39, Jan Beulich wrote:
> >>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>>> The xl dmesg output (from MTL this time):
> >>>>
> >>>> (XEN) [ 123.477511] Entering ACPI S3 state.
> >>>> (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0,
> >>>> cpu_has_apic: 1
> >>>> (XEN) [18446743903.571856] _disable_pit_irq:2659:
> >>>> cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>
> >>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
> >>> separately. As per xen-cpuid output (below) ARAT is available.
> >>
> >> For this you may want to also add logging to intel_init_arat(): Since
> >> opt_arat
> >> can be false only due to command line option use, it can only be the
> >> function
> >> not being called (which looks impossible on plain staging code), or
> >> cpu_has_arat
> >> being false despite the xen-cpuid output that you supplied earlier
> >> (inexplicable
> >> as well, at least for now).
> >
> > Hm, I got this:
> >
> > (XEN) [ 11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
> >
> > so, cpu_has_arat=0 ...
> > next lines are those, to hint when it happened in the boot process:
> >
> > (XEN) [ 11.409754] mwait-idle: MWAIT substates: 0x11112020
> > (XEN) [ 11.416130] mwait-idle: v0.4.1 model 0xaa
> > (XEN) [ 11.422396] mwait-idle: lapic_timer_reliable_states 0x2
> >
> > Looks like calculate_host_policy() runs much later...
>
> Hmm, yes, and that's the problem. The reason I don't see this is that a newer
> version of [1] has this
>
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
> }
>
> /* Now the feature flags better reflect actual CPU features! */
> + if (c == &boot_cpu_data)
> + calculate_host_policy();
>
> xstate_init(c);
>
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
> /* Was already added by probe_cpuid_faulting() */
> }
>
> -static void __init calculate_host_policy(void)
> +void __init calculate_host_policy(void)
> {
> struct cpu_policy *p = &host_cpu_policy;
>
> @@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
>
> void __init init_guest_cpu_policies(void)
> {
> + /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses.
> */
> calculate_host_policy();
>
> if ( IS_ENABLED(CONFIG_PV) )
>
> and of course I'm doing my work (and my analysis) with that in place.
FWIW, with this patch applied I get:
(XEN) [18446743899.051851] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743899.051865] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1,
boot_cpu_has(X86_FEATURE_XEN_ARAT): 1
And no IOMMU faults anymore.
> I may need to break this out and submit independently, but really the problem
> here is that the containing series has been sitting largely unreviewed (and
> hence not in a position to plausibly re-post) for almost 5 years. Andrew,
> (maybe also Roger) - I'm open to suggestions how to proceed. When your xstate
> cleanup patches were helped to go in ahead of mine, you promised to help mine
> going in afterwards. Yet nothing has happened (and I'm tired of re-submitting
> large pieces of work just for the sake of re-submitting, i.e. without having
> has [sufficient] feedback on the earlier version).
>
> Jan
>
> [1] https://lists.xen.org/archives/html/xen-devel/2021-04/msg01336.html
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |