[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] patch for restricted vPMU modes





On Mon, Nov 23, 2015 at 6:35 AM, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote:
(+ Dietmar)

On 11/20/2015 07:21 PM, Brendan Gregg wrote:
G'Day,

The vpmu feature of Xen is incredibly useful for performance analysis, however, it's currently all counters or nothing. In secure environments, there can be hesitation to enable access to all PMCs (there are hundreds of them). I've included a prototype patch that introduces two new restricted vpmu modes:

vpmu=ipc: As the most restricted minimum set. This enables cycles, reference cycles, and instructions only. This is enough to calculate instructions per cycle (IPC).

vpm=arch: This enables the 7 pre-defined architectural events as listed in cpuid, and in Table 18-1 of the Intel software developer's manual, vol 3B.

There can be a third mode added later on, with a larger set (including micro-ops PMCs).

These new features would need a corresponding change in Linux for PV guests (or for dom0 to change feature set globally). But before that do_xenpmu_op()'s XENPMU_feature_set clause will have to be updated to deal with new modes.

Ok, thanks. For now this is HVM, and I can EINVAL in do_xenpmu_op() for the new modes, like with BTS. (Do you know if the later Linux changes would be more than a feature version of pmu_mode_store/pmu_mode_show?)




I've included the short patch below for Xen 4.6.0, which provides these modes (it also fixes a minor copy-and-paste error with core2_get_fixed_pmc_count(), which I believe was accessing the wrong register). I am not a veteran Xen programmer, so please feel free to edit or rewrite this patch. In case this email messes it up, it's also on: https://github.com/brendangregg/Misc/blob/master/xen/xen-4.6.0-vpmu-filter.diff

I've shown testing of four modes (off, on, ipc, arch) here: https://gist.github.com/brendangregg/b7318c0f49bf906dc8df

For example, here is Linux perf running in a PVHVM guest with the new vpmu=ipc mode:

root@vm0hvm:~# perf stat -d ./noploop

ÂPerformance counter stats for './noploop':

   Â1511.326375 task-clock (msec)    Â#  0.999 CPUs utilized
        24 context-switches     #  0.016 K/sec
        Â0 cpu-migrations      #  0.000 K/sec
       Â113 page-faults       Â#  0.075 K/sec
  Â5,028,638,883 cycles          #  3.327 GHz
        Â0 stalled-cycles-frontend Â#  0.00% frontend cycles idle
        Â0 stalled-cycles-backend  #  0.00% backend cycles idle
  20,043,427,933 instructions       #  3.99 insns per cycle
        Â0 branches         #  0.000 K/sec
        Â0 branch-misses      Â#  0.00% of all branches
        Â0 L1-dcache-loads     Â#  0.000 K/sec
        Â0 L1-dcache-load-misses  Â#  0.00% of all L1-dcache hits
        Â0 LLC-loads        Â#  0.000 K/sec
 Â<not supported> LLC-load-misses:HG

Note that IPC is shown ("insns per cycle"), but other counters are not.


---patch---
diff -ur xen-4.6.0-clean/docs/misc/xen-command-line.markdown xen-4.6.0-brendan/docs/misc/xen-command-line.markdown
--- xen-4.6.0-clean/docs/misc/xen-command-line.markdown2015-10-05 07:33:39.000000000 -0700
+++ xen-4.6.0-brendan/docs/misc/xen-command-line.markdown2015-11-20 15:29:05.663781176 -0800
@@ -1444,7 +1444,7 @@
Âflushes on VM entry and exit, increasing performance.
Â### vpmu
-> `= ( bts )`
+> `= ( <boolean> | bts | ipc | arch )`
Â> Default: `off`
@@ -1460,6 +1460,15 @@
ÂIf 'vpmu=bts' is specified the virtualisation of the Branch Trace Store (BTS)
Âfeature is switched on on Intel processors supporting this feature.
+vpmu=ipc enables performance monitoring, but restricts the counters to the
+most minimum set possible: instructions, cycles, and reference cycles. These
+can be used to calculate instructions per cycle (IPC).
+
+vpmu=arch enables performance monitoring, but restricts the counters to the
+pre-defined architectural events only. These are exposed by cpuid, and listed
+in Table 18-1 from the Intel 64 and IA-32 Architectures Software Developer's
+Manual, Volume 3B, System Programming Guide, Part 2.
+
ÂNote that if **watchdog** option is also specified vpmu will be turned off.
Â*Warning:*
diff -ur xen-4.6.0-clean/xen/arch/x86/cpu/vpmu.c xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu.c
--- xen-4.6.0-clean/xen/arch/x86/cpu/vpmu.c2015-10-05 07:33:39.000000000 -0700
+++ xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu.c2015-11-20 15:29:50.847781176 -0800

@@ -43,9 +43,11 @@
ÂCHECK_pmu_params;
Â/*
- * "vpmu" :Â Â Âvpmu generally enabled
- * "vpmu=off" : vpmu generally disabled
- * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
+ * "vpmu" :Â Â Âvpmu generally enabled (all counters)
+ * "vpmu=off"Â : vpmu generally disabled
+ * "vpmu=bts"Â : vpmu enabled and Intel BTS feature switched on.
+ * "vpmu=ipc"Â : vpmu enabled for IPC counters only (most restrictive)
+ * "vpmu=arch" : vpmu enabled for predef arch counters only (restrictive)
 */
Âstatic unsigned int __read_mostly opt_vpmu_enabled;
Âunsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
@@ -67,6 +69,10 @@
  Âdefault:
    Âif ( !strcmp(s, "bts") )
      Âvpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+Â Â Â Â else if ( !strcmp(s, "ipc") )
+Â Â Â Â Â Â vpmu_features |= XENPMU_FEATURE_IPC_ONLY;
+Â Â Â Â else if ( !strcmp(s, "arch") )
+Â Â Â Â Â Â vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
    Âelse if ( *s )
    Â{
      Âprintk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
diff -ur xen-4.6.0-clean/xen/arch/x86/cpu/vpmu_intel.c xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu_intel.c
--- xen-4.6.0-clean/xen/arch/x86/cpu/vpmu_intel.c2015-10-05 07:33:39.000000000 -0700
+++ xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu_intel.c2015-11-20 15:29:42.571781176 -0800
@@ -166,10 +166,10 @@
 */
Âstatic int core2_get_fixed_pmc_count(void)
Â{
-Â Â u32 eax;
+Â Â u32 edx;
-Â Â eax = cpuid_eax(0xa);
-Â Â return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
+Â Â edx = cpuid_edx(0xa);
+Â Â return MASK_EXTR(edx, PMU_FIXED_NR_MASK);
Â}

This would need to be made into a separate patch since it fixes a bug.

Ok, thanks.




Â/* edx bits 5-12: Bit width of fixed-function performance counters */
@@ -652,12 +652,52 @@
    Âtmp = msr - MSR_P6_EVNTSEL(0);
    Âif ( tmp >= 0 && tmp < arch_pmc_cnt )
    Â{
+Â Â Â Â Â Â int umaskevent, blocked = 0;

Should be uint64_t and bool_t.

Ok, thanks.
Â


      Âstruct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
        Âvpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
      Âif ( msr_content & ARCH_CTRL_MASK )
        Âreturn -EINVAL;
+Â Â Â Â Â Â /* PMC filters */
+Â Â Â Â Â Â umaskevent = msr_content & MSR_IA32_CMT_EVTSEL_UE_MASK;


I don't see this mask defined anywhere. (I assume it's 0xffffffff).

Ah, sorry, will be there in v2 (I'm switching to git send-email, thanks Jan). It'sÂ0x0000ffff.
Â

Also, if either of those two flags is set we probably want to block MSR_IA32_DS_AREA and MSR_IA32_PEBS_ENABLE accesses as well.

Ok, yes, good idea.
Â


+Â Â Â Â Â Â if ( vpmu_features & XENPMU_FEATURE_IPC_ONLY ||
+Â Â Â Â Â Â Â Â Âvpmu_features & XENPMU_FEATURE_ARCH_ONLY )
+Â Â Â Â Â Â {
+Â Â Â Â Â Â Â Â blocked = 1;
+Â Â Â Â Â Â Â Â switch ( umaskevent )
+Â Â Â Â Â Â Â Â {
+Â Â Â Â Â Â Â Â /*
+Â Â Â Â Â Â Â Â Â* See Table 18-1 from the Intel 64 and IA-32 Architectures Software
+Â Â Â Â Â Â Â Â Â* Developer's Manual, Volume 3B, System Programming Guide, Part 2.
+Â Â Â Â Â Â Â Â Â*/
+Â Â Â Â Â Â Â Â case 0x003c:/* unhalted core cycles */
+Â Â Â Â Â Â Â Â case 0x013c:/* unhalted ref cycles */
+Â Â Â Â Â Â Â Â case 0x00c0:/* instruction retired */
+Â Â Â Â Â Â Â Â Â Â blocked = 0;
+Â Â Â Â Â Â Â Â default:
+Â Â Â Â Â Â Â Â Â Â break;
+Â Â Â Â Â Â Â Â }
+Â Â Â Â Â Â }
+
+Â Â Â Â Â Â if ( vpmu_features & XENPMU_FEATURE_ARCH_ONLY )
+Â Â Â Â Â Â {
+Â Â Â Â Â Â Â Â /* additional counters beyond IPC only; blocked already set */
+Â Â Â Â Â Â Â Â switch ( umaskevent )
+Â Â Â Â Â Â Â Â {
+Â Â Â Â Â Â Â Â case 0x4f2e:/* LLC reference */
+Â Â Â Â Â Â Â Â case 0x412e:/* LLC misses */
+Â Â Â Â Â Â Â Â case 0x00c4:/* branch instruction retired */
+Â Â Â Â Â Â Â Â case 0x00c5:/* branch */
+Â Â Â Â Â Â Â Â Â Â blocked = 0;
+Â Â Â Â Â Â Â Â default:
+Â Â Â Â Â Â Â Â Â Â break;
+Â Â Â Â Â Â Â Â}
+Â Â Â Â Â Â }
+
+Â Â Â Â Â Â if ( blocked )
+Â Â Â Â Â Â Â Â return -EINVAL;
+
      Âif ( has_hvm_container_vcpu(v) )
Âvmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&core2_vpmu_cxt->global_ctrl);
diff -ur xen-4.6.0-clean/xen/include/public/pmu.h xen-4.6.0-brendan/xen/include/public/pmu.h
--- xen-4.6.0-clean/xen/include/public/pmu.h2015-10-05 07:33:39.000000000 -0700
+++ xen-4.6.0-brendan/xen/include/public/pmu.h2015-11-20 15:30:08.887781176 -0800
@@ -84,9 +84,17 @@
Â/*
 * PMU features:
- * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ * - XENPMU_FEATURE_INTEL_BTS:Â Intel BTS support (ignored on AMD)
+ * - XENPMU_FEATURE_IPC_ONLY:Â ÂRestrict PMC to the most minimum set possible.
+ *Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Instructions, cycles, and ref cycles. Can be
+ *Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â used to calculate instructions-per-cycle (IPC).
+ * - XENPMU_FEATURE_ARCH_ONLY:Â Restrict PMCs to the Intel pre-defined
+ *Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â architecteral events exposed by cpuid and
+ *Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â listed in Table 18-1 of the developer's manual.

Needs "(ignored on AMD)"

Ok, thanks.

Brendan
Â


-boris



 */
-#define XENPMU_FEATURE_INTEL_BTSÂ 1
+#define XENPMU_FEATURE_INTEL_BTSÂ (1<<0)
+#define XENPMU_FEATURE_IPC_ONLYÂ Â(1<<1)
+#define XENPMU_FEATURE_ARCH_ONLYÂ (1<<2)
Â/*
 * Shared PMU data between hypervisor and PV(H) domains.
---patch---


Brendan

--
Brendan Gregg, Senior Performance Architect, Netflix




--
Brendan Gregg, Senior Performance Architect, Netflix
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.