[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 02/22] x86/msr: implement MSR_SMI_COUNT for Dom0 on Intel




> On 30 Oct 2023, at 16:20, Jan Beulich <jbeulich@xxxxxxxx> wrote:
> 
> On 25.10.2023 21:29, Edwin Török wrote:
>> Dom0 should always be able to read this MSR: it is useful when
>> investigating performance issues in production.
> 
> While I'm not outright opposed, I'm also not convinced. At the very least
> ...
> 
>> Although the count is Thread scoped, in practice all cores were observed
>> to return the same count (perhaps due to implementation details of SMM),
>> so do not require the cpu to be pinned in order to read it.
> 
> ... this, even if matching your observations, is going to be properly
> misleading in case counts end up diverging.
> 
>> This MSR exists on Intel since Nehalem.
>> 
>> Backport: 4.15+
> 
> If this was a backporting candidate, I think a Fixes: tag would need
> to indicate what's being fixed here.


I used the Backport tag to indicate what is the oldest release that it is 
backportable to.
IIRC the choices were:
* 4.0+ for issues that were present for a long time (didn't look further back 
than that in history), so there isn't any particular commit that introduces the 
problem, it was like that from the very beginning, i.e. not a regression.

* 4.13+ for issues affecting only new CPU support (I think that is the release 
that added Icelake support). I can attempt to find the commit that added 
Icelake support and mark them as Fixes: that commit (although there might be 
several of them)

* 4.15+ for bugs introduced by the default read-write msr changes


> Otherwise this is merely a new
> feature.
> 

Prior to the default rdwrmsr changes it was possible to read any MSR, so I 
consider it a bug that after the default rdwrmsr changes you can no longer do 
that, it takes away a valuable debugging tool.
I can attempt to find a more specific commit to indicate with Fixes:

>> --- a/xen/arch/x86/include/asm/msr-index.h
>> +++ b/xen/arch/x86/include/asm/msr-index.h
>> @@ -641,6 +641,9 @@
>> #define MSR_NHL_LBR_SELECT 0x000001c8
>> #define MSR_NHL_LASTBRANCH_TOS 0x000001c9
>> 
>> +/* Nehalem and newer other MSRs */
>> +#define MSR_SMI_COUNT                       0x00000034
> 
> See my comment on the other patch regarding additions here.
> 
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -139,6 +139,13 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> *val)
>>         *val = msrs->misc_features_enables.raw;
>>         break;
>> 
>> +    case MSR_SMI_COUNT:
>> +        if ( cp->x86_vendor != X86_VENDOR_INTEL )
>> +            goto gp_fault;
>> +        if ( is_hardware_domain(d) && !rdmsr_safe(msr, *val) )
>> +            break;
>> +        return X86EMUL_UNHANDLEABLE;
> 
> Why #GP for non-Intel but UNHANDLEABLE for DomU?

I wanted to match the behaviour of the 'default:' case statement, although 
looking at the other MSR handling code in this file they just usually gp_fault,
with the exception of the MSR_K8* code that returns UNHANDLEABLE, so if 
condition here could be simplified (e.g. follow how MSR_AMD_PATCHLEVEL does it).

Best regards,
--Edwin 

> 
> Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.