[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init



On 28.09.22 18:32, Juergen Gross wrote:
On 28.09.22 18:12, Borislav Petkov wrote:
On Wed, Sep 28, 2022 at 03:43:56PM +0200, Juergen Gross wrote:
Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?

This would avoid a possible source of failure during resume in case no slot
for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).

Let's keep that in the bag for the time when we get to cross that bridge.

You wouldn't want to do that there, as there are multiple places where
pm_sleep_enable_secondary_cpus() is being called.

We want all of them, I'd say. They're all some sort of suspend AFAICT.
But yes, if we get to do it, that would need a proper audit.

Additionally not all cases are coming in via
pm_sleep_enable_secondary_cpus(), as there is e.g. a call of
suspend_enable_secondary_cpus() from kernel_kexec(), which wants to
have the same handling.

Which means, more hairy.

arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
the functions to mark start and end of the special region where the
delayed MTRR setup should happen.

Yap, it seems like the best solution at the moment. Want me to do a
proper patch and test it on real hw?

I can do that.

Okay, lets define what is meant by "that" just to be on the same page.

The idea to use a hotplug callback seems to be rather risky IMHO. At least
CPUHP_AP_ONLINE_DYN seems to be way too late, as there are several device
drivers hooking in with the same or lower priority already. And device
drivers might rely on PAT settings in PTEs of MTRR being setup correctly.

Another problematic case is CPUHP_AP_MICROCODE_LOADER, which is explicitly
doing cache writeback and invalidation, which seems to be risky without
having a sane PAT/MTRR state of the processor. It should be noted that the
microcode loader is registered via late_initcall(), so boot isn't affected
by the delayed MTRR/PAT init when booting.

So the only secure way to use a hotplug callback would be to have a rather
early preregistered slot in enum cpuhp_state.

Regarding resume and kexec I'm no longer sure doing the delayed MTRR/PAT
init is such a great idea. It might save some milliseconds, but the risks
mentioned above with e.g. microcode loading should apply.

So right now I'm inclined to be better on the safe side by not adding any
cpu hotplug hook, but to use just the same "delayed AP init" flag as today,
just renaming it. This would leave the delayed MTRR/PAT init in place for
resume and kexec cases, but deferring the MTRR/PAT cleanup due to this
potential issue seems not appropriate, as the cleanup isn't changing the
behavior here.

We should, however, have a discussion in parallel or later, whether the
whole thaw_secondary_cpus() handling is really okay or whether it should
be changed in some way.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.