[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 00/10] PML (Paging Modification Logging) support



Hi all,

This patch series adds PML support to Xen. Please kindly help to review it.

The patches were organized in below way:

patch 0:   
    enables EPT A/D bit support, which is a dependence of PML
patch 1 to 8:
    changes in VMX to enable PML
patch 9:
    change log-dirty common code to support PML
patch 10: 
    change p2m-ept.c to support PML

The design has already been discussed previously at below thread:

http://lists.xen.org/archives/html/xen-devel/2015-02/msg01305.html

I listed the brief changes to previous design according to your comments for
your quick reference, while the complete updated design was also pasted below.

- Instead of flushing PML buffer at beginning of every VMEXIT, at peek/clear
log-dirty op, we explicitly flush all vcpus' PML buffers before reporting dirty
pages to userspace.

- Address creating vcpu when domain is already in log-dirty mode, suggested by
Tim.

- In flushing PML buffer, instead of calling paging_mark_dirty unconditionally
for all logged dirty pages, we call p2m_type_change_one to change guest page
from log-dirty mode to normal mode, and only when the p2m_type_change_one has 
been
successfully done, the guest page will be marked dirty. This is required by
video ram tracking, and it's also reasonable to do it.

I also did some performance measure which was pasted after the design for your
reference.

====================== The Design ======================================

PML (Page Modification Logging) is a new feature on Intel's Boardwell server
platfrom targeted to reduce overhead of dirty logging mechanism. This patch
series add PML support to Xen.

Background
==========

Currently, dirty logging is done via write protection, which basically sets
guest memory we want to log to be read-only, then when guest performs first
write to that memory, write fault (EPT violation in case of EPT is used)
happens, in which we are able to log the dirty GFN, and set the W permission
bit. This mechanism works but at cost of one write fault for each first write
from the guest.

PML Introduction
================

PML is a hardware-assisted efficient way, based on EPT mechanism, for dirty
logging. Briefly, PML logs dirty GPA automatically to a 4K PML buffer when CPU
changes EPT table's D-bit from 0 to 1. To accomplish this, a new PML buffer base
address (machine address), a PML index, and a new PML buffer full VMEXIT were
added to VMCS. Initially PML index can be set to 511 (8 bytes for each GPA) to
indicate the buffer is empty, and CPU decreases PML index by 1 after logging
GPA. Before performing GPA logging, PML checks PML index to see if PML buffer
has been fully logged, in which case a PML buffer full VMEXIT happens, and VMM
should flush logged GPAs (to data structure keeps dirty pages) and reset PML
index so that further GPAs can be logged again.

The specification of PML can be found at:
http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

With PML, we don't have to use write protection but just clear D-bit of EPT
entry of guest memory to do dirty logging, with an additional PML buffer full
VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead
when guest is in dirty logging mode, and more CPU cycles can be allocated to
guest, therefore it's expected benchmarks in guest will have better performance
comparing to non-PML.

Design
======

- PML feature is used globally

A new Xen boot parameter will be added to control PML feature detection. Once
PML feature is detected, it will be used for dirty logging for all domains
globally. Currently we don't support to use PML on basis of per-domain as it
will require additional control from XL tool.

The new parameter is a top level parameter 'ept=<options>' with a sub-boolean
'pml_enable'. PML is disabled by default and 'ept=pml' enables it.

- PML enable/disable for particular Domain

Enabling PML means:
    - allocate PML buffer, set PML base address
    - initialize PML index to 511
    - turn on PML in VMCS

PML needs to be enabled globally for all vcpus of a domain, as PML buffer and
PML index are per-vcpu, but EPT table is shared by vcpus, therefore enabling
PML on partial vcpus of a domain won't work.

PML will be enabled for the domain when it is switched to log-dirty mode, and
vice versa will be disabled when domain is switched back to normal mode.

Also commented by Tim Deegan, there have been cases where VMs are put into
log-dirty mode before their VCPUs are assigned, so we also need to handle
enabling PML for new vcpus. In this case, when creating vcpu, we check if the
domain has already been in log-dirty mode or not. If domain has been in
log-dirty mode, we enable PML for it, and failure of enabling PML results in
failure of vcpu creation.

After PML is enabled for the domain, we only need to clear EPT entry's D-bit for
guest memory in order to log dirty pages, instead of setting EPT entry to be
read-only. And for partial log-dirty, if PML is used, we also manually set D-bit
for guest memory that is in normal mode to avoid unnecessary GPA logging, as PML
works globally at entire EPT table.

However, for super pages, we still write protect them even with PML as we still
need to split super page to 4K page in log-dirty mode.

- PML buffer flush

PML buffer flush means:
    - read out PML index
    - go through all logged dirty GPAs, and update them to log-dirty radix tree
    - reset PML index to 511

There are two places we need to flush PML buffer. The first place is PML buffer
full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op
(either peek or clean), as vcpus are running asynchronously along with
paging_log_dirty_op is called from userspace via hypercall, and it's possible
there are dirty GPAs logged in vcpus' PML buffers but not full, therefore we'd
better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace.

Suggested by Jan and Tim, we flush PML buffer of vcpu on PML buffer full VMEXIT,
and in peek/clean path, we explicitly flush PML buffer of all vcpus of the
domain.

- Video RAM tracking (and partial dirty logging for guest memory range)

Video RAM is in dirty logging mode unconditionally during guest's run-time,
and it is partial memory range of the guest. However, PML operates on the
whole guest memory (the whole valid EPT table, more precisely), so we need
to choose whether to use PML if only partial guest memory ranges are in
dirty logging mode.

Currently, PML will be used as long as there's guest memory in dirty logging
mode, no matter globally or partially.

In flushing PML buffer, we will call p2m_type_change_one to change guest memory
from log-dirty to normal mode, and only the guest memory, which has been
successfully changed from log-dirty to normal mode, will be updated to log-dirty
radix tree. This is required by current video RAM tracking implementation.


======================== specjbb performance ===========================

I measured specjbb performance in guest when guest is in video ram tracking mode
(the most usual case I think), and when guest is in global log-dirty mode (I
made some change in XL tool to put guest global log-dirty mode infinitely, see
below), from which we can see that PML does improved the specjbb performance in
guest while guest is in log-dirty mode, and the more frequently dirty pages are
queried, the more performance gain we will have. So while PML probably can't
speed up live migration process directly, it will be benefical for use cases
such as guest memory dirty speed monitoring.

- video ram tracking:

    WP              PML         
    122805          123887
    120792          123249
    118577          123348
    121856          125195
    121286          122056
    120139          123037

avg 120909          123462      
    
    100%            102.11%    

performance gain:   2.11%                 

- global log-dirty:

    WP              PML
    72862           79511
    73466           81173
    72989           81177
    73138           81777
    72811           80257
    72486           80413

avg 72959           80718
    100%            110.63%

performance gain: 10.63%

Test machine: Boardwell server with 16 CPUs (1.6G) + 4G memory.
Xen hypervisor: lastest upstream Xen
dom0 kernel: 3.16.0
guest: 4 vcpus + 1G memory.
guest os: ubuntu 14.04 with 3.13.0-24-generic kernel.

Note: global log-dirty data was measured with below change, and running
'xl migrate <dom> localhost'.

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..88a10f1 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -335,7 +335,12 @@ static int analysis_phase(xc_interface *xch, uint32_t 
domid, struct save_ctx *ct
         
    start = llgettimeofday();
              
+#define PML_TEST
+#ifdef  PML_TEST
+    for ( j = 0; true; j++ )
+#else
    for ( j = 0; j < runs; j++ )
+#endif
    {
        int i;




Kai Huang (10):
  VMX: Enable EPT A/D bit support
  VMX: New parameter to control PML enabling
  VMX: Add PML definition and feature detection.
  VMX: New data structure member to support PML
  VMX: add help functions to support PML
  VMX: handle PML buffer full VMEXIT
  VMX: handle PML enabling in vmx_vcpu_initialise
  VMX: disable PML in vmx_vcpu_destroy
  log-dirty: Refine common code to support PML
  p2m/ept: Enable PML in p2m-ept for log-dirty.

 xen/arch/x86/hvm/vmx/vmcs.c        | 241 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |  37 ++++++
 xen/arch/x86/mm/hap/hap.c          |  31 ++++-
 xen/arch/x86/mm/p2m-ept.c          |  81 ++++++++++++-
 xen/arch/x86/mm/p2m.c              |  36 ++++++
 xen/arch/x86/mm/paging.c           |  15 ++-
 xen/arch/x86/mm/shadow/common.c    |   2 +-
 xen/include/asm-x86/domain.h       |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  25 +++-
 xen/include/asm-x86/hvm/vmx/vmx.h  |   6 +-
 xen/include/asm-x86/p2m.h          |  11 ++
 xen/include/asm-x86/paging.h       |   3 +-
 12 files changed, 474 insertions(+), 15 deletions(-)

-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.