[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v16 00/23] x86/PMU: Xen PMU PV(H) support



Am Mittwoch 17 Dezember 2014, 10:38:31 schrieb Boris Ostrovsky:
> Version 16 of PV(H) PMU patches.

Hi Boris,

I did a fast and simple test on a small Intel machine and all went fine. I'll
do some more after my vacation in 2015.
After adapting your sent linux kernel patches I'am interestet in tracing our
HVM guest from outside with perf. Are you already having some patches and would
you share these?
Thank you for your work!

Dietmar.

> 
> * Many changes in VPMU mode patch (#13):
>   * Replaced arguments to some vpmu routines (vcpu -> vpmu). New patch (#12)
>   * Added vpmu_unload vpmu op to completely unload vpmu data (e.g clear
>     MSR bitmaps). This routine may be called in context switch 
> (vpmu_switch_to()).
>   * Added vmx_write_guest_msr_vcpu() interface to write MSRs of non-current 
> VCPU
>   * Use cpumask_cycle instead of cpumask_next
>   * Dropped timeout error
>   * Adjusted types of mode variables
>   * Don't allow oprofile to allocate its context on MSR access if VPMU context
>     has already been allocated (which may happen when VMPU mode was set to off
>     while the guest was running)
> * vpmu_initialise() no longer turns off VPMU globally on failure. New patch 
> (#2)
> * vpmu_do_msr() will return 1 (failure) if vpmu_ops are not set. This is done 
> to
>   prevent PV guests that are not VPMU-enabled from wrongly assuming that they 
> have
>   access to counters (Linux check_hw_exists() will make this assumption) 
> (patch #18)
> * Add cpl field to shared structure that will be passed for HVM guests' 
> samples
>   (instead of PMU_SAMPLE_USER flag). Add PMU_SAMPLE_PV flag to mark whose 
> sample
>   is passed up. (Patches ## 10, 19, 22)
> 
> Changes in v15:
> 
> * Rewrote vpmu_force_context_switch() to use continue_hypercall_on_cpu()
> * Added vpmu_init initcall that will call vendor-specific init routines
> * Added a lock to vpmu_struct to serialize pmu_init()/pmu_finish()
> * Use SS instead of CS for DPL (instead of RPL)
> * Don't take lock for XENPMU_mode_get
> * Make vmpu_mode/features an unsigned int (from uint64_t)
> * Adjusted pvh_hypercall64_table[] order
> * Replaced address range check [XEN_VIRT_START..XEN_VIRT_END] with 
> guest_mode()
> * A few style cleanups
> 
> Changes in v14:
> 
> * Moved struct xen_pmu_regs to pmu.h
> * Moved CHECK_pmu_* to an earlier patch (when structures are first introduced)
> * Added PMU_SAMPLE_REAL flag to indicate whether the sample was taken in real 
> mode
> * Simplified slightly setting rules for xenpmu_data flags
> * Rewrote vpmu_force_context_switch() to again use continuations. (Returning 
> EAGAIN
>   to user would mean that VPMU mode may get into inconsistent state (across 
> processors)
>   and dealing with that is more compicated than I'd like).
> * Fixed msraddr_to_bitpos() and converted it into an inline
> * Replaced address range check in vmpu_do_interrupt() with guest_mode()
> * No error returns from __initcall
> * Rebased on top of recent VPMU changes
> * Various cleanups
> 
> Changes in v13:
> 
> * Rearranged data in xenpf_symdata to eliminate a hole (no change in
>   structure size)
> * Removed unnecessary zeroing of last character in name string during
>   symbol read hypercall
> * Updated comment in access_vectors for pmu_use operation
> * Compute AMD MSR bank size at runtime
> * Moved a couple of BUILD_BUG_ON later, to when the structures they are
>   checking are first used
> * Added ss and eflags to xen_pmu_registers structure
> * vpmu_force_context_switch() uses per-cpu tasklet pointers.
> * Moved most of arch-specific VPMU initialization code into an __initcall()
>   to avoid re-calculating/re-checking things more than once (new patch, #12)
> * Replaced is_*_domain() with is_*_vcpu()
> * choose_hwdom_vcpu() will now assume that hardware_domain->vcpu[idx]
>   is always there (callers will still verify whether or not that's true)
> * Fixed a couple of sampled vs. sampling tests in vpmu_do_interrupt()
> * Pass CS to the guest unchanged, add pmu_flags's flag to indicate whether the
>   sample was for a user or kernel space. Move pmu_flags from xen_pmu_data into
>   xen_pmu_arch
> * Removed local msr_content variable from vpmu_do_wrmsr()
> * Re-arranged code in parse_vpmu_param()
> * Made vpmu_interrupt_type checks test for value, not individual bits
> * Moved PMU_SOFTIRQ definition into arch-specific header
> * Moved vpmu*.c files into xen/arch/x86/cpu/ instead of xen/arch/x86/
> * For hypervisor samples, report DOMID_XEN to the guest
> * Changed logic to select which registers to report to the guest (include RIP
>   check to see whether we are in the hypervisor)
> 
> Changes in v12:
> 
> * Added XSM support
> * Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
> * Updated documentation for 'vpmu=nmi' option
> * Added more text to a bunch of commit messages (per Konrad's request)
> 
> Changes in v11:
> 
> * Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
>   - as part of this re-work noticed that CS registers were set in later patch 
> then
>     needed. Moved those changes to appropriate place
> * Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only 
> provide dom0
>   with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will 
> be what
>   XENPMU_MODE_SELF used to be.
> * Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around 
> vmx_add_msr()
> * Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to 
> context_switch())
> * Dropped hypercall continuation from vpmu_force_context_switch() and 
> replaced it with
>   -EAGAIN error if hypercall_preempt_check() is true after 2ms.
> * Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
> * Move context switching patch (#13) earlier in the series (for proper 
> bisection support)
> * Various comment updates and cleanups
> * Dropped a bunch of Reviewed-by and all Tested-by tags
> 
> Changes in v10:
> 
> * Swapped address and name fields of xenpf_symdata (to make it smaller on 
> 32-bit)
> * Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more 
> complicated.
> * Cleaned up vlapic_reg_write()
> * Call vpmu_destroy() for both HVM and PVH VCPUs
> * Verify that (xen_pmu_data+PMU register bank) fit into a page
> * Return error codes from arch-specific VPMU init code
> * Moved VPMU-related context switch logic into inlines
> * vpmu_force_context_switch() changes:
>   o Avoid greater than page-sized allocations
>   o Prevent another VCPU from starting VPMU sync while the first sync is in 
> progress
> * Avoid stack leak in do_xenpmu_op()
> * Checked validity of Intel VPMU MSR values before they are committed
> * Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on 
> AMD)
> * Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => 
> sampling)
> * Clarified commit messages (patches 2, 13, 18) 
> * Various cleanups
> 
> Changes in v9:
> 
> * Restore VPMU context after context_saved() is called in
>   context_switch(). This is needed because vpmu_load() may end up
>   calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
>   to be correctly set/cleared. (patch 18, dropped review acks)
> * Added patch 2 to properly manage VPMU_CONTEXT_LOADED
> * Addressed most of Jan's comments.
>   o Keep track of time in vpmu_force_context_switch() to properly break
>     out of a loop when using hypercall continuations
>   o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
>   o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
>     make it more clear which vcpu we are using
>   o Cleaned up vpmu_do_wrmsr()
>   o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
>     amd_vpmu_initialise(): throughout the code registers are declared as
>     uint64_t and if we are to add a new type (e.g. reg_t) this should be
>     done in a separate patch, unrelated to this series.
>   o Various more minor cleanups and code style fixes
>   
> Changes in v8:
> 
> * Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
> * Added compat checks for vpmu structures
> * Converted vpmu flag manipulation macros to inline routines
> * Reimplemented vpmu_unload_all() to avoid long loops
> * Reworked PMU fault generation and handling (new patch #12)
> * Added checks for domain->vcpu[] non-NULLness
> * Added more comments, renamed some routines and macros, code style cleanup
> 
> 
> Changes in v7:
> 
> * When reading hypervisor symbols make the caller pass buffer length
>   (as opposed to having this length be part of the API). Make the
>   hypervisor buffer static, make xensyms_read() return zero-length
>   string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
>   drop compat_pf_symdata definition.
> * Spread PVH support across patches as opposed to lumping it into a
>   separate patch
> * Rename vpmu_is_set_all() to vpmu_are_all_set()
> * Split VPMU cleanup patch in two
> * Use memmove when copying VMX guest and host MSRs
> * Make padding of xen_arch_pmu's context union a constand that does not
>   depend on arch context size.
> * Set interface version to 0.1
> * Check pointer validity in pvpmu_init/destroy()
> * Fixed crash in core2_vpmu_dump()
> * Fixed crash in vmx_add_msr()
> * Break handling of Intel and AMD MSRs in traps.c into separate cases
> * Pass full CS selector to guests
> * Add lock in pvpmu init code to prevent potential race
> 
> 
> Changes in v6:
> 
> * Two new patches:
>   o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
>   o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
> * Check for pending NMI softirq after saving VPMU context to prevent a 
> newly-scheduled
>   guest from overwriting sampled_vcpu written by de-scheduled VPCU.
> * Keep track of enabled counters on Intel. This was removed in earlier 
> patches and
>   was a mistake. As result of this change struct vpmu will have a pointer to 
> private
>   context data (i.e. data that is not exposed to a PV(H) guest). Use this 
> private pointer
>   on SVM as well for storing MSR bitmap status (it was unnecessarily exposed 
> to PV guests
>   earlier).
>   Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be 
> reviewed
>   agan (core2_vpmu_do_wrmsr() routine, mostly)
> * Replaced references to dom0 with hardware_domain (and is_control_domain with
>   is_hardware_domain for consistency)
> * Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
> * Reverted unnecessary changes in vpmu_initialise()'s switch statement
> * Fixed comment in vpmu_do_interrupt
> 
> 
> Changes in v5:
> 
> * Dropped patch number 2 ("Stop AMD counters when called from 
> vpmu_save_force()")
>   as no longer needed
> * Added patch number 2 that marks context as loaded before PMU registers are
>   loaded. This prevents situation where a PMU interrupt may occur while 
> context
>   is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
>   code)
> * Renamed xenpmu.h files to pmu.h
> * More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
>   has_hvm_container_domain(). Also explicitly disabled support for PVH until
>   patch 16 to make distinction between usage of the above macros more clear.
> * Added support for disabling VPMU support during runtime.
> * Disable VPMUs for non-privileged domains when switching to privileged
>   profiling mode
> * Added ARM stub for xen_arch_pmu_t
> * Separated vpmu_mode from vpmu_features
> * Moved CS register query to make sure we use appropriate query mechanism for
>   various guest types.
> * LVTPC is now set from value in shared area, not copied from dom0
> * Various code and comments cleanup as suggested by Jan.
> 
> Changes in v4:
> 
> * Added support for PVH guests:
>   o changes in pvpmu_init() to accommodate both PV and PVH guests, still in 
> patch 10
>   o more careful use of is_hvm_domain
>   o Additional patch (16)
> * Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe 
> handling
> * Fixed dom0's VCPU selection in privileged mode
> * Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in 
> vpmu_do_interrupt.
>   (don't want to expose compat_cpu_user_regs in a public header)
> * Renamed public structures by prefixing them with "xen_"
> * Added an entry for xenpf_symdata in xlat.lst
> * Fixed pv_cpuid check for vpmu-specific cpuid adjustments
> * Varios code style fixes
> * Eliminated anonymous unions
> * Added more verbiage to NMI patch description
> 
> 
> Changes in v3:
> 
> * Moved PMU MSR banks out from architectural context data structures to allow
> for future expansion without protocol changes
> * PMU interrupts can be either NMIs or regular vector interrupts (the latter
> is the default)
> * Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
> race conditions with the guest
> * Fixed races with PV guest in MSR access handlers
> * More Intel VPMU cleanup
> * Moved NMI-unsafe code from NMI handler
> * Dropped changes to vcpu->is_running
> * Added LVTPC apic handling (cached for PV guests)
> * Separated privileged profiling mode into a standalone patch
> * Separated NMI handling into a standalone patch
> 
> 
> Changes in v2:
> 
> * Xen symbols are exported as data structure (as opoosed to a set of formatted
> strings in v1). Even though one symbol per hypercall is returned performance
> appears to be acceptable: reading whole file from dom0 userland takes on 
> average
> about twice as long as reading /proc/kallsyms
> * More cleanup of Intel VPMU code to simplify publicly exported structures
> * There is an architecture-independent and x86-specific public include files 
> (ARM
> has a stub)
> * General cleanup of public include files to make them more presentable (and
> to make auto doc generation better)
> * Setting of vcpu->is_running is now done on ARM in schedule_tail as well 
> (making
> changes to common/schedule.c architecture-independent). Note that this is not
> tested since I don't have access to ARM hardware.
> * PCPU ID of interrupted processor is now passed to PV guest
> 
> 
> The following patch series adds PMU support in Xen for PV(H)
> guests. There is a companion patchset for Linux kernel. In addition,
> another set of changes will be provided (later) for userland perf
> code.
> 
> This version has following limitations:
> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
> on them.
> * No backtrace support.
> 
> A few notes that may help reviewing:
> 
> * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
> CPU is used for passing registers' values as well as PMU state at the time of
> PMU interrupt.
> * PMU interrupts are taken by hypervisor either as NMIs or regular vector
> interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM 
> guests
> and as virtual interrupts to PV(H) guests
> * PV guest's interrupt handler does not read/write PMU MSRs directly. 
> Instead, it
> accesses xenpmu_data_t and flushes it to HW it before returning.
> * PMU mode is controlled at runtime via 
> /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
> in addition to 'vpmu' boot option (which is preserved for back compatibility).
> The following modes are provided:
>   * disable: VPMU is off
>   * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself 
> and Xen
>   * priv_enable: dom0 only profiling. dom0 collects samples for everyone. 
> Sampling
>     in guests is suspended.
> * /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
> /proc/kallsyms)
> * VPMU infrastructure is now used for HVM, PV and PVH and therefore has been 
> moved
> up from hvm subtree
> 
> 
> Boris Ostrovsky (23):
>   common/symbols: Export hypervisor symbols to privileged guest
>   x86/VPMU: Don't globally disable VPMU if initialization fails.
>   x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
>   x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
>   x86/VPMU: Make vpmu macros a bit more efficient
>   intel/VPMU: Clean up Intel VPMU code
>   vmx: Merge MSR management routines
>   x86/VPMU: Handle APIC_LVTPC accesses
>   intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
>   x86/VPMU: Add public xenpmu.h
>   x86/VPMU: Make vpmu not HVM-specific
>   x86/VPMU: Replace vcpu with vpmu as argument to some routines
>   x86/VPMU: Interface for setting PMU mode and flags
>   x86/VPMU: Initialize VPMUs with __initcall
>   x86/VPMU: Initialize PMU for PV(H) guests
>   x86/VPMU: Save VPMU state for PV guests during context switch
>   x86/VPMU: When handling MSR accesses, leave fault injection to callers
>   x86/VPMU: Add support for PMU register handling on PV guests
>   x86/VPMU: Handle PMU interrupts for PV guests
>   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>   x86/VPMU: Add privileged PMU mode
>   x86/VPMU: NMI-based VPMU support
>   x86/VPMU: Move VPMU files up from hvm/ directory
> 
>  docs/misc/xen-command-line.markdown                |   8 +-
>  tools/flask/policy/policy/modules/xen/xen.te       |   7 +
>  xen/arch/x86/cpu/Makefile                          |   1 +
>  xen/arch/x86/cpu/vpmu.c                            | 896 
> +++++++++++++++++++++
>  xen/arch/x86/{hvm/svm/vpmu.c => cpu/vpmu_amd.c}    | 289 ++++---
>  .../x86/{hvm/vmx/vpmu_core2.c => cpu/vpmu_intel.c} | 815 ++++++++++---------
>  xen/arch/x86/domain.c                              |  27 +-
>  xen/arch/x86/hvm/Makefile                          |   1 -
>  xen/arch/x86/hvm/hvm.c                             |   1 +
>  xen/arch/x86/hvm/svm/Makefile                      |   1 -
>  xen/arch/x86/hvm/svm/svm.c                         |  10 +-
>  xen/arch/x86/hvm/vlapic.c                          |   3 +
>  xen/arch/x86/hvm/vmx/Makefile                      |   1 -
>  xen/arch/x86/hvm/vmx/vmcs.c                        |  91 +--
>  xen/arch/x86/hvm/vmx/vmx.c                         |  28 +-
>  xen/arch/x86/hvm/vpmu.c                            | 266 ------
>  xen/arch/x86/oprofile/nmi_int.c                    |   3 +-
>  xen/arch/x86/oprofile/op_model_ppro.c              |   8 +-
>  xen/arch/x86/platform_hypercall.c                  |  28 +
>  xen/arch/x86/traps.c                               |  62 +-
>  xen/arch/x86/x86_64/compat/entry.S                 |   4 +
>  xen/arch/x86/x86_64/entry.S                        |   4 +
>  xen/common/event_channel.c                         |   1 +
>  xen/common/symbols.c                               |  54 ++
>  xen/include/Makefile                               |   2 +
>  xen/include/asm-x86/domain.h                       |   2 +
>  xen/include/asm-x86/hvm/vcpu.h                     |   3 -
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |  25 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h           |  51 --
>  xen/include/asm-x86/hvm/vpmu.h                     | 105 ---
>  xen/include/asm-x86/softirq.h                      |   3 +-
>  xen/include/asm-x86/vpmu.h                         | 149 ++++
>  xen/include/public/arch-arm.h                      |   3 +
>  xen/include/public/arch-x86/pmu.h                  |  97 +++
>  xen/include/public/platform.h                      |  19 +
>  xen/include/public/pmu.h                           |  90 +++
>  xen/include/public/xen.h                           |   2 +
>  xen/include/xen/hypercall.h                        |   4 +
>  xen/include/xen/symbols.h                          |   3 +
>  xen/include/xlat.lst                               |   6 +
>  xen/include/xsm/dummy.h                            |  20 +
>  xen/include/xsm/xsm.h                              |   6 +
>  xen/xsm/dummy.c                                    |   1 +
>  xen/xsm/flask/hooks.c                              |  28 +
>  xen/xsm/flask/policy/access_vectors                |   6 +
>  45 files changed, 2194 insertions(+), 1040 deletions(-)
>  create mode 100644 xen/arch/x86/cpu/vpmu.c
>  rename xen/arch/x86/{hvm/svm/vpmu.c => cpu/vpmu_amd.c} (64%)
>  rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => cpu/vpmu_intel.c} (56%)
>  delete mode 100644 xen/arch/x86/hvm/vpmu.c
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
>  create mode 100644 xen/include/asm-x86/vpmu.h
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
> 
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.