[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support



On Thu, Sep 25, 2014 at 03:28:36PM -0400, Boris Ostrovsky wrote:
> Here is the twelfth version of PV(H) PMU patches.

Daniel,

Would you be OK taking a look at patch #1, #11, and #16 (#15
has the new hypercall, but the patch for XSM made it in there)
to see if the XSM changes are correct?

Dietmar,

I took a look at the patches with the SDM and with
https://software.intel.com/sites/default/files/76/87/30320
in hand, and it all looked correct. But having an expert Review it
would be fantastic!

Jan,
I believe the patch #9 has the changes to the structure that
would be satisfactory to you?


Boris,
I only had one change (the one related to 'pahole') and in the grand
scheme of things it does not matter.

All,

Looking at the patches from an perspective of making an exception
for Xen 4.5 I would like to explain my thinking a bit.

The PMU functionality is only active if the system admin boots with 'pmu=1'
or (with this new patchset) through a hypercall to toggle it on/off different 
modes.

That means from a default use case this functionality is off.
It should have no regressions whatsoever. Digging at the code does demonstrate
that so I am OK with that.

With the use-case of somebody using 'pmu=1' and expecting it to
continue working - that is smaller subset of users, and important nonethless.
I plan on testing it next week just that test-case in mind and making sure
it does not break.


From an 'awesome release' perspective I believe this patchset is important.
It will allow us to further enhance Xen and figure out where it has problems
and work on fixing them. That either being in the field or on developers
machines.

Barring any major changes above, and if the maintainers are OK with the
patchset (and could provide the Reviewed-by or Acked-by tags), I believe this
patchset warrants an exception.

> 
> Changes in v12:
> 
> * Added XSM support
> * Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
> * Updated documentation for 'vpmu=nmi' option
> * Added more text to a bunch of commit messages (per Konrad's request)
> 
> Changes in v11:
> 
> * Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
>   - as part of this re-work noticed that CS registers were set in later patch 
> then
>     needed. Moved those changes to appropriate place
> * Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only 
> provide dom0
>   with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will 
> be what
>   XENPMU_MODE_SELF used to be.
> * Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around 
> vmx_add_msr()
> * Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to 
> context_switch())
> * Dropped hypercall continuation from vpmu_force_context_switch() and 
> replaced it with
>   -EAGAIN error if hypercall_preempt_check() is true after 2ms.
> * Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
> * Move context switching patch (#13) earlier in the series (for proper 
> bisection support)
> * Various comment updates and cleanups
> * Dropped a bunch of Reviewed-by and all Tested-by tags
> 
> Changes in v10:
> 
> * Swapped address and name fields of xenpf_symdata (to make it smaller on 
> 32-bit)
> * Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more 
> complicated.
> * Cleaned up vlapic_reg_write()
> * Call vpmu_destroy() for both HVM and PVH VCPUs
> * Verify that (xen_pmu_data+PMU register bank) fit into a page
> * Return error codes from arch-specific VPMU init code
> * Moved VPMU-related context switch logic into inlines
> * vpmu_force_context_switch() changes:
>   o Avoid greater than page-sized allocations
>   o Prevent another VCPU from starting VPMU sync while the first sync is in 
> progress
> * Avoid stack leak in do_xenpmu_op()
> * Checked validity of Intel VPMU MSR values before they are committed
> * Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on 
> AMD)
> * Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => 
> sampling)
> * Clarified commit messages (patches 2, 13, 18) 
> * Various cleanups
> 
> Changes in v9:
> 
> * Restore VPMU context after context_saved() is called in
>   context_switch(). This is needed because vpmu_load() may end up
>   calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
>   to be correctly set/cleared. (patch 18, dropped review acks)
> * Added patch 2 to properly manage VPMU_CONTEXT_LOADED
> * Addressed most of Jan's comments.
>   o Keep track of time in vpmu_force_context_switch() to properly break
>     out of a loop when using hypercall continuations
>   o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
>   o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
>     make it more clear which vcpu we are using
>   o Cleaned up vpmu_do_wrmsr()
>   o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
>     amd_vpmu_initialise(): throughout the code registers are declared as
>     uint64_t and if we are to add a new type (e.g. reg_t) this should be
>     done in a separate patch, unrelated to this series.
>   o Various more minor cleanups and code style fixes
>   
> Changes in v8:
> 
> * Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
> * Added compat checks for vpmu structures
> * Converted vpmu flag manipulation macros to inline routines
> * Reimplemented vpmu_unload_all() to avoid long loops
> * Reworked PMU fault generation and handling (new patch #12)
> * Added checks for domain->vcpu[] non-NULLness
> * Added more comments, renamed some routines and macros, code style cleanup
> 
> 
> Changes in v7:
> 
> * When reading hypervisor symbols make the caller pass buffer length
>   (as opposed to having this length be part of the API). Make the
>   hypervisor buffer static, make xensyms_read() return zero-length
>   string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
>   drop compat_pf_symdata definition.
> * Spread PVH support across patches as opposed to lumping it into a
>   separate patch
> * Rename vpmu_is_set_all() to vpmu_are_all_set()
> * Split VPMU cleanup patch in two
> * Use memmove when copying VMX guest and host MSRs
> * Make padding of xen_arch_pmu's context union a constand that does not
>   depend on arch context size.
> * Set interface version to 0.1
> * Check pointer validity in pvpmu_init/destroy()
> * Fixed crash in core2_vpmu_dump()
> * Fixed crash in vmx_add_msr()
> * Break handling of Intel and AMD MSRs in traps.c into separate cases
> * Pass full CS selector to guests
> * Add lock in pvpmu init code to prevent potential race
> 
> 
> Changes in v6:
> 
> * Two new patches:
>   o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
>   o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
> * Check for pending NMI softirq after saving VPMU context to prevent a 
> newly-scheduled
>   guest from overwriting sampled_vcpu written by de-scheduled VPCU.
> * Keep track of enabled counters on Intel. This was removed in earlier 
> patches and
>   was a mistake. As result of this change struct vpmu will have a pointer to 
> private
>   context data (i.e. data that is not exposed to a PV(H) guest). Use this 
> private pointer
>   on SVM as well for storing MSR bitmap status (it was unnecessarily exposed 
> to PV guests
>   earlier).
>   Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be 
> reviewed
>   agan (core2_vpmu_do_wrmsr() routine, mostly)
> * Replaced references to dom0 with hardware_domain (and is_control_domain with
>   is_hardware_domain for consistency)
> * Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
> * Reverted unnecessary changes in vpmu_initialise()'s switch statement
> * Fixed comment in vpmu_do_interrupt
> 
> 
> Changes in v5:
> 
> * Dropped patch number 2 ("Stop AMD counters when called from 
> vpmu_save_force()")
>   as no longer needed
> * Added patch number 2 that marks context as loaded before PMU registers are
>   loaded. This prevents situation where a PMU interrupt may occur while 
> context
>   is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
>   code)
> * Renamed xenpmu.h files to pmu.h
> * More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
>   has_hvm_container_domain(). Also explicitly disabled support for PVH until
>   patch 16 to make distinction between usage of the above macros more clear.
> * Added support for disabling VPMU support during runtime.
> * Disable VPMUs for non-privileged domains when switching to privileged
>   profiling mode
> * Added ARM stub for xen_arch_pmu_t
> * Separated vpmu_mode from vpmu_features
> * Moved CS register query to make sure we use appropriate query mechanism for
>   various guest types.
> * LVTPC is now set from value in shared area, not copied from dom0
> * Various code and comments cleanup as suggested by Jan.
> 
> Changes in v4:
> 
> * Added support for PVH guests:
>   o changes in pvpmu_init() to accommodate both PV and PVH guests, still in 
> patch 10
>   o more careful use of is_hvm_domain
>   o Additional patch (16)
> * Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe 
> handling
> * Fixed dom0's VCPU selection in privileged mode
> * Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in 
> vpmu_do_interrupt.
>   (don't want to expose compat_cpu_user_regs in a public header)
> * Renamed public structures by prefixing them with "xen_"
> * Added an entry for xenpf_symdata in xlat.lst
> * Fixed pv_cpuid check for vpmu-specific cpuid adjustments
> * Varios code style fixes
> * Eliminated anonymous unions
> * Added more verbiage to NMI patch description
> 
> 
> Changes in v3:
> 
> * Moved PMU MSR banks out from architectural context data structures to allow
> for future expansion without protocol changes
> * PMU interrupts can be either NMIs or regular vector interrupts (the latter
> is the default)
> * Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
> race conditions with the guest
> * Fixed races with PV guest in MSR access handlers
> * More Intel VPMU cleanup
> * Moved NMI-unsafe code from NMI handler
> * Dropped changes to vcpu->is_running
> * Added LVTPC apic handling (cached for PV guests)
> * Separated privileged profiling mode into a standalone patch
> * Separated NMI handling into a standalone patch
> 
> 
> Changes in v2:
> 
> * Xen symbols are exported as data structure (as opoosed to a set of formatted
> strings in v1). Even though one symbol per hypercall is returned performance
> appears to be acceptable: reading whole file from dom0 userland takes on 
> average
> about twice as long as reading /proc/kallsyms
> * More cleanup of Intel VPMU code to simplify publicly exported structures
> * There is an architecture-independent and x86-specific public include files 
> (ARM
> has a stub)
> * General cleanup of public include files to make them more presentable (and
> to make auto doc generation better)
> * Setting of vcpu->is_running is now done on ARM in schedule_tail as well 
> (making
> changes to common/schedule.c architecture-independent). Note that this is not
> tested since I don't have access to ARM hardware.
> * PCPU ID of interrupted processor is now passed to PV guest
> 
> 
> The following patch series adds PMU support in Xen for PV(H)
> guests. There is a companion patchset for Linux kernel. In addition,
> another set of changes will be provided (later) for userland perf
> code.
> 
> This version has following limitations:
> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
> on them.
> * No backtrace support.
> 
> A few notes that may help reviewing:
> 
> * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
> CPU is used for passing registers' values as well as PMU state at the time of
> PMU interrupt.
> * PMU interrupts are taken by hypervisor either as NMIs or regular vector
> interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM 
> guests
> and as virtual interrupts to PV(H) guests
> * PV guest's interrupt handler does not read/write PMU MSRs directly. 
> Instead, it
> accesses xenpmu_data_t and flushes it to HW it before returning.
> * PMU mode is controlled at runtime via 
> /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
> in addition to 'vpmu' boot option (which is preserved for back compatibility).
> The following modes are provided:
>   * disable: VPMU is off
>   * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself 
> and Xen
>   * priv_enable: dom0 only profiling. dom0 collects samples for everyone. 
> Sampling
>     in guests is suspended.
> * /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
> /proc/kallsyms)
> * VPMU infrastructure is now used for HVM, PV and PVH and therefore has been 
> moved
> up from hvm subtree
> 
> 
> 
> Boris Ostrovsky (20):
>   common/symbols: Export hypervisor symbols to privileged guest
>   x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
>   x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
>   x86/VPMU: Make vpmu macros a bit more efficient
>   intel/VPMU: Clean up Intel VPMU code
>   vmx: Merge MSR management routines
>   x86/VPMU: Handle APIC_LVTPC accesses
>   intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
>   x86/VPMU: Add public xenpmu.h
>   x86/VPMU: Make vpmu not HVM-specific
>   x86/VPMU: Interface for setting PMU mode and flags
>   x86/VPMU: Initialize PMU for PV(H) guests
>   x86/VPMU: Save VPMU state for PV guests during context switch
>   x86/VPMU: When handling MSR accesses, leave fault injection to callers
>   x86/VPMU: Add support for PMU register handling on PV guests
>   x86/VPMU: Handle PMU interrupts for PV guests
>   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>   x86/VPMU: Add privileged PMU mode
>   x86/VPMU: NMI-based VPMU support
>   x86/VPMU: Move VPMU files up from hvm/ directory
> 
>  docs/misc/xen-command-line.markdown                |   8 +-
>  tools/flask/policy/policy/modules/xen/xen.te       |   7 +
>  xen/arch/x86/Makefile                              |   1 +
>  xen/arch/x86/domain.c                              |  23 +-
>  xen/arch/x86/hvm/Makefile                          |   1 -
>  xen/arch/x86/hvm/hvm.c                             |   3 +-
>  xen/arch/x86/hvm/svm/Makefile                      |   1 -
>  xen/arch/x86/hvm/svm/svm.c                         |  10 +-
>  xen/arch/x86/hvm/vlapic.c                          |   3 +
>  xen/arch/x86/hvm/vmx/Makefile                      |   1 -
>  xen/arch/x86/hvm/vmx/vmcs.c                        |  84 +--
>  xen/arch/x86/hvm/vmx/vmx.c                         |  28 +-
>  xen/arch/x86/hvm/vpmu.c                            | 265 -------
>  xen/arch/x86/oprofile/op_model_ppro.c              |   8 +-
>  xen/arch/x86/platform_hypercall.c                  |  33 +
>  xen/arch/x86/traps.c                               |  60 +-
>  xen/arch/x86/vpmu.c                                | 826 
> +++++++++++++++++++++
>  xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}        | 158 ++--
>  .../x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c}     | 639 ++++++++--------
>  xen/arch/x86/x86_64/compat/entry.S                 |   4 +
>  xen/arch/x86/x86_64/entry.S                        |   4 +
>  xen/common/event_channel.c                         |   1 +
>  xen/common/symbols.c                               |  54 ++
>  xen/include/Makefile                               |   2 +
>  xen/include/asm-x86/domain.h                       |   2 +
>  xen/include/asm-x86/hvm/vcpu.h                     |   3 -
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |  18 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h           |  51 --
>  xen/include/asm-x86/{hvm => }/vpmu.h               |  94 ++-
>  xen/include/public/arch-arm.h                      |   3 +
>  xen/include/public/arch-x86/pmu.h                  |  77 ++
>  xen/include/public/arch-x86/xen-x86_32.h           |   8 +
>  xen/include/public/arch-x86/xen-x86_64.h           |   8 +
>  xen/include/public/platform.h                      |  19 +
>  xen/include/public/pmu.h                           |  95 +++
>  xen/include/public/xen.h                           |   2 +
>  xen/include/xen/hypercall.h                        |   4 +
>  xen/include/xen/softirq.h                          |   1 +
>  xen/include/xen/symbols.h                          |   3 +
>  xen/include/xlat.lst                               |   5 +
>  xen/include/xsm/dummy.h                            |  20 +
>  xen/include/xsm/xsm.h                              |   6 +
>  xen/xsm/dummy.c                                    |   1 +
>  xen/xsm/flask/hooks.c                              |  28 +
>  xen/xsm/flask/policy/access_vectors                |  18 +-
>  xen/xsm/flask/policy/security_classes              |   1 +
>  46 files changed, 1872 insertions(+), 819 deletions(-)
>  delete mode 100644 xen/arch/x86/hvm/vpmu.c
>  create mode 100644 xen/arch/x86/vpmu.c
>  rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (74%)
>  rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (60%)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  rename xen/include/asm-x86/{hvm => }/vpmu.h (55%)
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
> 
> -- 
> 1.8.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.