[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/AMD: expose SYSCFG, TOM, and TOM2 to Dom0



On 27.05.2021 10:33, Roger Pau Monné wrote:
> On Wed, May 26, 2021 at 02:59:00PM +0200, Jan Beulich wrote:
>> Sufficiently old Linux (3.12-ish) accesses these MSRs in an unguarded
>> manner. Furthermore these MSRs, at least on Fam11 and older CPUs, are
>> also consulted by modern Linux, and their (bogus) built-in zapping of
>> #GP faults from MSR accesses leads to it effectively reading zero
>> instead of the intended values, which are relevant for PCI BAR placement
>> (which ought to all live in MMIO-type space, not in DRAM-type one).
>>
>> For SYSCFG, only certain bits get exposed. In fact, whether to expose
>> MtrrVarDramEn is debatable: It controls use of not just TOM, but also
>> the IORRs. Introduce (consistently named) constants for the bits we're
>> interested in and use them in pre-existing code as well.
> 
> I think we should also allow access to the IORRs MSRs for coherency
> (c001001{6,9}) for the hardware domain.

Hmm, originally I was under the impression that these could conceivably
be written by OSes, and hence would want dealing with separately. But
upon re-reading I see that they are supposed to be set by the BIOS alone.
So yes, let me add them for read access, taking care of the limitation
that I had to spell out.

This raises the question then though whether to also include SMMAddr
and SMMMask in the set - the former does get accessed by Linux as well,
and was one of the reasons for needing 6eef0a99262c ("x86/PV:
conditionally avoid raising #GP for early guest MSR reads").

Especially for SMMAddr, and maybe also for IORR_BASE, returning zero
for DomU-s might be acceptable. The respective masks, however, can
imo not sensibly be returned as zero. Hence even there I'd leave DomU
side handling (see below) for a later time.

>> As a welcome side effect, verbosity on/of debug builds gets (perhaps
>> significantly) reduced.
>>
>> Note that at least as far as those MSR accesses by Linux are concerned,
>> there's no similar issue for DomU-s, as the accesses sit behind PCI
>> device matching logic. The checked for devices would never be exposed to
>> DomU-s in the first place. Nevertheless I think that at least for HVM we
>> should return sensible values, not 0 (as svm_msr_read_intercept() does
>> right now). The intended values may, however, need to be determined by
>> hvmloader, and then get made known to Xen.
> 
> Could we maybe come up with a fixed memory layout that hvmloader had
> to respect?
> 
> Ie: DRAM from 0 to 3G, MMIO from 3G to 4G, and then the remaining
> DRAM from 4G in a contiguous single block?
> 
> hvmloader would have to place BARs that don't fit in the 3G-4G hole at
> the end of DRAM (ie: after TOM2).

Such a fixed scheme may be too limiting, I'm afraid.

>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -339,6 +339,19 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>          *val = msrs->tsc_aux;
>>          break;
>>  
>> +    case MSR_K8_SYSCFG:
>> +    case MSR_K8_TOP_MEM1:
>> +    case MSR_K8_TOP_MEM2:
>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +            goto gp_fault;
>> +        if ( !is_hardware_domain(d) )
>> +            return X86EMUL_UNHANDLEABLE;
> 
> It might be clearer to also handle the !is_hardware_domain case here,
> instead of deferring to svm_msr_read_intercept:
> 
> if ( is_hardware_domain(d) )
>     rdmsrl(msr, *val);
> else
>     *val = 0;

As said in the post-commit-message remark, I don't think returning 0
here is appropriate. I'd be willing to move DomU handling here, but
only once it's sane.

>> @@ -279,10 +286,7 @@
>>  #define MSR_K8_TOP_MEM1                     0xc001001a
>>  #define MSR_K7_CLK_CTL                      0xc001001b
>>  #define MSR_K8_TOP_MEM2                     0xc001001d
>> -#define MSR_K8_SYSCFG                       0xc0010010
>>  
>> -#define K8_MTRRFIXRANGE_DRAM_ENABLE 0x00040000 /* MtrrFixDramEn bit    */
>> -#define K8_MTRRFIXRANGE_DRAM_MODIFY 0x00080000 /* MtrrFixDramModEn bit */
>>  #define K8_MTRR_RDMEM_WRMEM_MASK    0x18181818 /* Mask: RdMem|WrMem    */
> 
> That last one seem to be unused, I wonder if you could also drop it as
> part of this cleanup?

It's for an unrelated set of MSRs, so I thought I'd leave it alone
here. Otoh the value is at best bogus anyway, as it assumes both
halves of fixed-size MTRRs get accessed separately. So I guess
since you ask for it, I'll drop it at this occasion.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.