[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH 00/10] PML (Paging Modification Logging) support
Hi all, This patch series adds PML support to Xen. Please kindly help to review it. The patches were organized in below way: patch 0: enables EPT A/D bit support, which is a dependence of PML patch 1 to 8: changes in VMX to enable PML patch 9: change log-dirty common code to support PML patch 10: change p2m-ept.c to support PML The design has already been discussed previously at below thread: http://lists.xen.org/archives/html/xen-devel/2015-02/msg01305.html I listed the brief changes to previous design according to your comments for your quick reference, while the complete updated design was also pasted below. - Instead of flushing PML buffer at beginning of every VMEXIT, at peek/clear log-dirty op, we explicitly flush all vcpus' PML buffers before reporting dirty pages to userspace. - Address creating vcpu when domain is already in log-dirty mode, suggested by Tim. - In flushing PML buffer, instead of calling paging_mark_dirty unconditionally for all logged dirty pages, we call p2m_type_change_one to change guest page from log-dirty mode to normal mode, and only when the p2m_type_change_one has been successfully done, the guest page will be marked dirty. This is required by video ram tracking, and it's also reasonable to do it. I also did some performance measure which was pasted after the design for your reference. ====================== The Design ====================================== PML (Page Modification Logging) is a new feature on Intel's Boardwell server platfrom targeted to reduce overhead of dirty logging mechanism. This patch series add PML support to Xen. Background ========== Currently, dirty logging is done via write protection, which basically sets guest memory we want to log to be read-only, then when guest performs first write to that memory, write fault (EPT violation in case of EPT is used) happens, in which we are able to log the dirty GFN, and set the W permission bit. This mechanism works but at cost of one write fault for each first write from the guest. PML Introduction ================ PML is a hardware-assisted efficient way, based on EPT mechanism, for dirty logging. Briefly, PML logs dirty GPA automatically to a 4K PML buffer when CPU changes EPT table's D-bit from 0 to 1. To accomplish this, a new PML buffer base address (machine address), a PML index, and a new PML buffer full VMEXIT were added to VMCS. Initially PML index can be set to 511 (8 bytes for each GPA) to indicate the buffer is empty, and CPU decreases PML index by 1 after logging GPA. Before performing GPA logging, PML checks PML index to see if PML buffer has been fully logged, in which case a PML buffer full VMEXIT happens, and VMM should flush logged GPAs (to data structure keeps dirty pages) and reset PML index so that further GPAs can be logged again. The specification of PML can be found at: http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html With PML, we don't have to use write protection but just clear D-bit of EPT entry of guest memory to do dirty logging, with an additional PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in dirty logging mode, and more CPU cycles can be allocated to guest, therefore it's expected benchmarks in guest will have better performance comparing to non-PML. Design ====== - PML feature is used globally A new Xen boot parameter will be added to control PML feature detection. Once PML feature is detected, it will be used for dirty logging for all domains globally. Currently we don't support to use PML on basis of per-domain as it will require additional control from XL tool. The new parameter is a top level parameter 'ept=<options>' with a sub-boolean 'pml_enable'. PML is disabled by default and 'ept=pml' enables it. - PML enable/disable for particular Domain Enabling PML means: - allocate PML buffer, set PML base address - initialize PML index to 511 - turn on PML in VMCS PML needs to be enabled globally for all vcpus of a domain, as PML buffer and PML index are per-vcpu, but EPT table is shared by vcpus, therefore enabling PML on partial vcpus of a domain won't work. PML will be enabled for the domain when it is switched to log-dirty mode, and vice versa will be disabled when domain is switched back to normal mode. Also commented by Tim Deegan, there have been cases where VMs are put into log-dirty mode before their VCPUs are assigned, so we also need to handle enabling PML for new vcpus. In this case, when creating vcpu, we check if the domain has already been in log-dirty mode or not. If domain has been in log-dirty mode, we enable PML for it, and failure of enabling PML results in failure of vcpu creation. After PML is enabled for the domain, we only need to clear EPT entry's D-bit for guest memory in order to log dirty pages, instead of setting EPT entry to be read-only. And for partial log-dirty, if PML is used, we also manually set D-bit for guest memory that is in normal mode to avoid unnecessary GPA logging, as PML works globally at entire EPT table. However, for super pages, we still write protect them even with PML as we still need to split super page to 4K page in log-dirty mode. - PML buffer flush PML buffer flush means: - read out PML index - go through all logged dirty GPAs, and update them to log-dirty radix tree - reset PML index to 511 There are two places we need to flush PML buffer. The first place is PML buffer full VMEXIT handler (apparently), and the second place is in paging_log_dirty_op (either peek or clean), as vcpus are running asynchronously along with paging_log_dirty_op is called from userspace via hypercall, and it's possible there are dirty GPAs logged in vcpus' PML buffers but not full, therefore we'd better to flush all vcpus' PML buffers before reporting dirty GPAs to userspace. Suggested by Jan and Tim, we flush PML buffer of vcpu on PML buffer full VMEXIT, and in peek/clean path, we explicitly flush PML buffer of all vcpus of the domain. - Video RAM tracking (and partial dirty logging for guest memory range) Video RAM is in dirty logging mode unconditionally during guest's run-time, and it is partial memory range of the guest. However, PML operates on the whole guest memory (the whole valid EPT table, more precisely), so we need to choose whether to use PML if only partial guest memory ranges are in dirty logging mode. Currently, PML will be used as long as there's guest memory in dirty logging mode, no matter globally or partially. In flushing PML buffer, we will call p2m_type_change_one to change guest memory from log-dirty to normal mode, and only the guest memory, which has been successfully changed from log-dirty to normal mode, will be updated to log-dirty radix tree. This is required by current video RAM tracking implementation. ======================== specjbb performance =========================== I measured specjbb performance in guest when guest is in video ram tracking mode (the most usual case I think), and when guest is in global log-dirty mode (I made some change in XL tool to put guest global log-dirty mode infinitely, see below), from which we can see that PML does improved the specjbb performance in guest while guest is in log-dirty mode, and the more frequently dirty pages are queried, the more performance gain we will have. So while PML probably can't speed up live migration process directly, it will be benefical for use cases such as guest memory dirty speed monitoring. - video ram tracking: WP PML 122805 123887 120792 123249 118577 123348 121856 125195 121286 122056 120139 123037 avg 120909 123462 100% 102.11% performance gain: 2.11% - global log-dirty: WP PML 72862 79511 73466 81173 72989 81177 73138 81777 72811 80257 72486 80413 avg 72959 80718 100% 110.63% performance gain: 10.63% Test machine: Boardwell server with 16 CPUs (1.6G) + 4G memory. Xen hypervisor: lastest upstream Xen dom0 kernel: 3.16.0 guest: 4 vcpus + 1G memory. guest os: ubuntu 14.04 with 3.13.0-24-generic kernel. Note: global log-dirty data was measured with below change, and running 'xl migrate <dom> localhost'. diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c index 254fdb3..88a10f1 100644 --- a/tools/libxc/xc_domain_save.c +++ b/tools/libxc/xc_domain_save.c @@ -335,7 +335,12 @@ static int analysis_phase(xc_interface *xch, uint32_t domid, struct save_ctx *ct start = llgettimeofday(); +#define PML_TEST +#ifdef PML_TEST + for ( j = 0; true; j++ ) +#else for ( j = 0; j < runs; j++ ) +#endif { int i; Kai Huang (10): VMX: Enable EPT A/D bit support VMX: New parameter to control PML enabling VMX: Add PML definition and feature detection. VMX: New data structure member to support PML VMX: add help functions to support PML VMX: handle PML buffer full VMEXIT VMX: handle PML enabling in vmx_vcpu_initialise VMX: disable PML in vmx_vcpu_destroy log-dirty: Refine common code to support PML p2m/ept: Enable PML in p2m-ept for log-dirty. xen/arch/x86/hvm/vmx/vmcs.c | 241 +++++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/vmx/vmx.c | 37 ++++++ xen/arch/x86/mm/hap/hap.c | 31 ++++- xen/arch/x86/mm/p2m-ept.c | 81 ++++++++++++- xen/arch/x86/mm/p2m.c | 36 ++++++ xen/arch/x86/mm/paging.c | 15 ++- xen/arch/x86/mm/shadow/common.c | 2 +- xen/include/asm-x86/domain.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 25 +++- xen/include/asm-x86/hvm/vmx/vmx.h | 6 +- xen/include/asm-x86/p2m.h | 11 ++ xen/include/asm-x86/paging.h | 3 +- 12 files changed, 474 insertions(+), 15 deletions(-) -- 2.1.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |