[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote: > On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote: > >On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote: > >>Hi, > >> > >>while testing some APERF/MPERF semantics I discovered that this > >>feature is enabled in Xen Dom0, but is not reliable. > >>The Linux kernel's scheduler uses this feature if it sees the CPUID > >>bit, leading to costly RDMSR traps (a few 100,000s during a kernel > >>compile) and bogus values due to VCPU migration during the > > > >Can you point me to the Linux scheduler code that does this? Thanks. > > arch/x86/kernel/cpu/sched.c contains code to read out and compute > APERF/MPERF registers. I added a Xen debug-key to dump a usage > counter added in traps.c and thus could prove that it is actually > the kernel that accesses these registers. > As far as I understood this the idea is to learn about boosting and > down-clocking (P-states) to get a fairer view on the actual > computing time a process consumed. Looks like its looking for this: X86_FEATURE_APERFMPERF Perhaps masking that should do it? Something along this in enlighten.c: cpuid_leaf1_edx_mask = ~((1 << X86_FEATURE_MCE) | /* disable MCE */ (1 << X86_FEATURE_MCA) | /* disable MCA */ (1 << X86_FEATURE_MTRR) | /* disable MTRR */ (1 << X86_FEATURE_ACC)); /* thermal monitoring would be more appropiate? Or is that attribute on a different leaf? > > >>measurement. > >>The attached patch explicitly disables this CPU capability inside > >>the Linux kernel, I couldn't measure any APERF/MPERF reads anymore > >>with the patch applied. > >>I am not sure if the PVOPS code is the right place to fix this, we > >>could as well do it in the HV's xen/arch/x86/traps.c:pv_cpuid(). > >>Also when the Dom0 VCPUs are pinned, we could allow this, but I am > >>not sure if it's worth to do so. > >> > >>Awaiting your comments. > >> > >>Regards, > >>Andre. > >> > >>P.S. Of course this doesn't fix pure userland software like > >>cpupower, but I would consider this in the user's responsibility to > > > >Which would not work anymore as the cpufreq support is disabled > >when it boots under Xen. > > Do you mean with "anymore" in a future kernel? I tested this on > 3.4.0 and cpupower monitor worked fine. Right, cpufreq is not > enabled, but cpupower uses the /dev/cpu/<n>/msr device file to > directly read the MSRs. So I get this output if run on an idle Dom0: Ahh. Neat. Will have to play with that. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |