[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 11/12] x86/microcode: Synchronize late microcode loading
On 13/03/2019 08:02, Jan Beulich wrote: >>>> On 13.03.19 at 08:54, <JBeulich@xxxxxxxx> wrote: >>>>> On 13.03.19 at 06:02, <chao.gao@xxxxxxxxx> wrote: >>> On Tue, Mar 12, 2019 at 05:07:51PM -0700, Raj, Ashok wrote: >>>> On Mon, Mar 11, 2019 at 03:57:35PM +0800, Chao Gao wrote: >>>>> + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) >>>>> + ret = microcode_update_cpu(); >>>> Does ret have any useful things on if the update failed? Doesn't seem >>>> to be used before you overwrite later in collect_cpu_info()? >>> It has the reason of failure on error. Actally, there are two reasons: >>> one is no patch of newer revision, the other is we tried to update but >>> the microcode revision didn't change. I can check this return value and >>> print more informative message to admin. And furthermore, for the >>> latter, we can remove the ucode patch from caches to address Roger's >>> concern expressed in comments to patch 4 & 5. >> Btw, I'm not sure removing such ucode from the cache is appropriate: >> It may well apply elsewhere, unless there's a clear indication that the >> blob is broken. So perhaps there needs to be special casing of -EIO, >> which gets returned when the ucode rev reported by the CPU after >> the update does not match expectations. > An to go one step further, perhaps we should also store more than > just the newest variant for a given pf. If the newest fails to apply > but there is another one newer than what's on a CPU, updating to > that may work, and once that intermediate update worked, the > update to the newest version may then work too. I don't think this is sensible. Running with mismatched microcode is unsupported (for a suitable definition of mismatched). At boot, there will be 1 individual blob which is applicable to the CPUs, and gets loaded on all APs. (Possibly more than 1 blob, but remember that only multi-socket servers and top end workstations have a chance of having mixed steppings in the first place.) During late load, we will have 1 (or more) blobs provided in the late hypercall, and we will apply in a logically-atomic fashion in a rendezvous'd context. There are a few outcomes from this action. If the ucode application fails internally, the system is already lost and will crash. This is an inherent risk which people doing late loading need to be aware of (and why test workloads exist in production setups). If some cores accept the update but others don't, then we've also got serious problems and probably system instability. This is the kind of problem which needs to be detected during testing, and may require a change in application strategy, or may in practice prevent a particular from being declared safe to late load. The expected case is that all cores accept the blob during the rendezvous. As a result, I don't see any need to store more than a single ucode version (whether this is a single blob or perhaps a set of closely related blobs for a mixed-stepping system) in the stead state (to cope with AP boot, suspend/resume, or CPU hotplug), and a second version which is the proposed-new ucode for late loading. On late load failure, we should dump enough information to work out exactly what went on, to determine how best to proceed, but the server is effectively lost to us. On late load success, the proposed new "version" replaces the current "version". And again - I reiterate the point that I think it is fine to have a simplifying assumption that we don't have mixed stepping systems to start with, presuming this is generally in line with Intel's support statement. If in practice we find mixed stepping systems which are supported by an OEM/Intel, we can see about extending the logic. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |