[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 22/22] x86/AMD: add IRPerf support




> On 25 Oct 2023, at 21:59, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> wrote:
> 
> On 25/10/2023 8:29 pm, Edwin Török wrote:
>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>> index 483b5e4f70..b3cd851d9d 100644
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> val)
>>         }
>>         break;
>> 
>> +    case MSR_K8_HWCR:
>> +        if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
>> +             (val & ~K8_HWCR_IRPERF_EN) ||
>> +             wrmsr_safe(msr, val) != 0 )
>> +            goto gp_fault;
>> +        break;
>> +
>>     case MSR_AMD64_DE_CFG:
>>         /*
>>          * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
>> b/xen/include/public/arch-x86/cpufeatureset.h
>> index 5faca0bf7a..40f74cd5e8 100644
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF 
>> Read Only interface */
>> 
>> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>> XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
>> -XEN_CPUFEATURE(IRPERF,        8*32+ 1) /* Instruction Retired Performance 
>> Counter */
>> +XEN_CPUFEATURE(IRPERF,        8*32+ 1) /*A! Instruction Retired Performance 
>> Counter */
>> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always 
>> saves/restores FPU Error pointers */
>> XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
>> XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used 
>> by AMD) */
> 
> Last things first.  You can advertise this bit to guests, but I'm
> looking at the AMD manuals and woefully little information on what this
> is an how it works.
> 
> It does have an architectural enumeration, but there isn't even a a
> description of what HWCR.IPERF_EN does.
> 
> The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but
> the IRPERF MSR says no such think,

> noting only that EFRO_LOCK makes the
> MSR non-writeable (which in turn means we can't always offer it to
> guests anyway.  See below).

> At a guess, the HWCR bit activates the counter, but nothing states
> this.


My version 
(https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf)
 says:
"This is a dedicated counter that is always counting instructions retired. It 
exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and 
its support is indicated by CPUID Fn8000_0008_EBX[1]."

Although digging a bit more there are some erratas around deep C states on some 
chips and the HWCR register itself is not global but per CCD (which is neatly 
buried in the _ccd[7:0]_.... smallprint at the top:
https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147


>   At a guess, it's a free-running, reset-able counter, but again
> nothing states this.  The manual just says "is a dedicated counter" and
> doesn't even state whether it is (or is not) tied into the other core
> fixed counters and whether it is integrated with PMI or not.

There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is 
user-mode only and per-core, but I assume that is completely different from 
this MSR
and part of the features that got dropped in newer cores.

> 
> Which brings us on to the middle hunk.  Putting it there short circuits
> the PV-specific handling, but you can't do that in general without
> arranging for HWCR to be context switched properly for vCPUs, nor in
> this case IPERF_COUNT itself.
> 
> Unless you context switch HWCR and IPERF_COUNT, a guest will see it not
> counting, or the count going backwards, 50% of the time as the vCPU is
> scheduled around.
> 
> So while in principle we could let guests use this facility (it would
> have to be off by default, because there doesn't appear to be any
> filtering capability in it at all, so we can't restrict it to
> instructions retired in guest context), it will need a far larger patch
> than this to do it safely.

Thanks, I'll move this patch into the 'needs more work' category.


Thanks,
--Edwin




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.