Xen project Mailing List

[Xen-devel] Ping: Re: [PATCH] x86/AMD: correct certain Fam17 checks

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 27 May 2019 03:32:20 -0600

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Brian Woods <brian.woods@xxxxxxx>, Pu Wen <puwen@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 27 May 2019 09:32:31 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 25.03.19 at 09:41, wrote: >>>> On 22.03.19 at 20:56, <andrew.cooper3@xxxxxxxxxx> wrote: > > On 19/03/2019 09:16, Jan Beulich wrote: > >> Commit 3157bb4e13 ("Add MSR support for various feature AMD processor > >> families") converted certain checks for Fam11 to include families all > >> the way up to Fam17. The commit having no description, it is hard to > >> tell whether this was a mechanical dec->hex conversion mistake, or > >> indeed intended. In any event the NB_CFG handling needs to be restricted > >> to Fam16 and below: Fam17 doesn't have such an MSR anymore. > >> > >> A non-MMCFG extended config space access mechanism still appears to > >> exist, but code to deal with it will need to be written down the road, > >> when it can actually be tested. > >> > >> Reported-by: Pu Wen <puwen@xxxxxxxx> > >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > > > Having looked through various spec documents, this is a chronic mess. > > > > First, to NB_CTL MSR itself. It certainly is documented in Fam15, and > > is absence in the documentation of Fam17. > > > > In Fam15, it is explicitly documented as an alias of > > 00:18.3[NB_CFG_LOW/HIGH] which are registers at offset 0x88 and 0x8c in > > config space. > > > > Fam17 documents that the extended cfc/cf8 mechanism does still exist, > > and the new controls for that found in 00:18.4[CoreMasterAccessCtrl] > > with a different bit layout. > > > > Experimentation on a Rome system indicates that NB_CTL is fully > > read0/write discard, so this patch is probably an improvement. > > Hmm, if it's read-zero / write-discard, then I guess we would better > emulate that behavior. I simply didn't expect it to be that way, as > I would generally assume undocumented _and_ unused MSRs to > raise #GP. In which case the simplest thing to do would be to drop > the one respective hunk changing emul-priv-op.c:write_msr(). > > > Therefore, in principle, Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Well, even without the point above I wouldn't be sure whether > using an "in principle" ack as justification for committing is > appropriate. So perhaps (see above and below) we'd better > settle on something that you could agree to with less > reservations (by directly adjusting this patch, or by adding a > 2nd one on top). I continue to be hesitant to commit with the above kind of "restricted" ack. Could you clarify if we want to work out a better solution, or whether I indeed should put in what's there right now? Thanks, Jan > > However, the actual code touched is completely insane. > > > > HVM guests have it automatically read0/write discard, even for Intel > > (citing cross vendor migration). > > > > PV guest handling is complicated. For CPUs without the MSR, it is read > > #GP, write discard. For CPUs which do have the MSR, it is read0/write > > discard for domU or nonpinned dom0, which is 100% of usecases. The PV > > vs HVM differences cause an asymetry for the hardware domain. > > So how about we make it uniformly read-zero / write-discard, as long > as the emulated CPUID has vendor AMD and family 0x10 and higher? > Or even independent of vendor and family, to fully match current > HVM behavior? > > > A pinned PV dom0 is permitted to change just the IO_ECS bit, eliciting a > > warning but no #GP for modifying other bits. As NB_CTL is a per-node > > control (not a per-core control), unless dom0 vcpus == host pcpus, this > > creates an asymmetry across the system as to whether IO_ECS is enabled > > or not. > > Correct, but there's no inconsistency from Dom0's POV. Furthermore, > as you explain further down, the PCI config space method of changing > the bit was specifically added to Linux to avoid this inconsistency. > > > The HVM IOREQ path, and PV cfg_ok() path, when seeing an IO_ECS access > > from the guest, reads the MSR every time, which is results in behaviour > > which doesn't match the settings a guest can see, and comes with a > > massive perf hit. > > Indeed it should be the guest view of the MSR which ought to be used > there. > > > It also means the behaviour of the guest IO depends > > on which node it is currently scheduled on. > > If Dom0 enabled things in an asymmetric way. > > > The IO_ECS setting should be chosen once at boot, made consistent across > > the entire system, and never touched at runtime. > > I don't fully agree (but it somewhat depends on what "at boot" is > supposed to mean in your reply): We should leave the choice to Dom0, > but it would probably not hurt to enforce a consistent setting. Doing > this when Dom0 uses the MSR method would be relatively easy, but > then again Dom0 shouldnt use that approach anyway. Doing this > when Dom0 uses PCI config space writes would be less straightforward, > as we'd have to write protect the north bridge devices' config spaces. > Which wouldn't be very useful imo, as Dom0 - if it already uses the > PCI config space approach - would update all north bridges anyway. > > > In all cases for guests, we can offer MMCFG even on a system which > > doesn't have IO_ECS, and they will prefer that. The behaviour of the > > extra 4 bits is reserved, so we could have IO_ECS working in practice > > with no signal. However, we could equally drop IO_ECS entirely. Guests > > can't find its setting to begin with, so can't reliably use it, and also > > wouldn't in the presence of MMCFG anyway. > > All this assuming that platforms correctly surface MMCFG. This not > having been the case in at least the Fam10 days was one of the > reasons why Linux (and the Xen) gained all this code. Furthermore > to date I didn't think we would surface MMCFG to any of our guests. > > Jan > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.