[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] PML (Page Modification Logging) design for Xen

Hi all,

PML (Page Modification Logging) is a new feature on Intel's Boardwell server 
platfrom targeted to reduce overhead of dirty logging mechanism. Below is the 
design for Xen. Would you help to review and give comments?


Currently, dirty logging is done via write protection, which basically sets 
guest memory we want to log to be read-only, then when guest performs write to 
that memory, write fault (EPT violation in case of EPT is used) happens, in 
which we are able to log the dirty GFN. This mechanism works but at cost of one 
write fault for each write from the guest.

PML Introduction

PML is a hardware-assisted efficient way, based on EPT mechanism, for dirty 
logging. Briefly, PML logs dirty GPA automatically to a 4K PML buffer when CPU 
changes EPT table's D-bit from 0 to 1. To accomplish this, A new PML buffer 
base address (machine address), a PML index, and a new PML buffer full VMEXIT 
were added to VMCS. Initially PML index can be set to 511 (8 bytes for each 
GPA) to indicate the buffer is empty, and CPU decreases PML index by 1 after 
logging GPA. Before performing GPA logging, PML checks PML index to see if PML 
buffer has been fully logged, in which case a PML buffer full VMEXIT happens, 
and VMM should flush logged GPAs (to data structure keeps dirty GPAs) and reset 
PML index so that further GPAs can be logged again.

The specification of PML can be found at:

With PML, we don't have to use write protection but just clear D-bit of EPT 
entry of guest memory to do dirty logging, with an additional PML buffer full 
VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead 
when guest is in dirty logging mode, and therefore more CPU cycles can be 
allocated to guest, so it's expected benchmarks in guest will have better 
performance comparing to non-PML.


- PML feature is used globally

A new Xen boot parameter, say 'opt_enable_pml', will be introduced to control 
PML feature detection, and PML feature will only be detected if opt_enable_pml 
= 1. Once PML feature is detected, it will be used for dirty logging for all 
domains globally. Currently we don't support to use PML on basis of per-domain 
as it will require additional control from XL tool.

- PML enable/disable for particular Domain

PML needs to be enabled (allocate PML buffer, initialize PML index, PML base 
address, turn PML on VMCS, etc) for all vcpus of the domain, as PML buffer and 
PML index are per-vcpu, but EPT table may be shared by vcpus. Enabling PML on 
partial vcpus of the domain won't work. Also PML will only be enabled for the 
domain when it is switched to dirty logging mode, and it will be disabled when 
domain is switched back to normal mode. As looks vcpu number won't be changed 
dynamically during guest is running (correct me if I am wrong here), so we 
don't have to consider enabling PML for new created vcpu when guest is in dirty 
logging mode.

After PML is enabled for the domain, we only need to clear EPT entry's D-bit 
for guest memory in dirty logging mode. We achieve this by checking if PML is 
enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and 
updating EPT entry accordingly. However, for super pages, we still write 
protect them in case of PML as we still need to split super page to 4K page in 
dirty logging mode.

- PML buffer flush

There are two places we need to flush PML buffer. The first place is PML buffer 
full VMEXIT handler (apparently), and the second place is in 
paging_log_dirty_op (either peek or clean), as vcpus are running asynchronously 
along with paging_log_dirty_op is called from userspace via hypercall, and it's 
possible there are dirty GPAs logged in vcpus' PML buffers but not full. 
Therefore we'd better to flush all vcpus' PML buffers before reporting dirty 
GPAs to userspace.

We handle above two cases by flushing PML buffer at the beginning of all 
VMEXITs. This solves the first case above, and it also solves the second case, 
as prior to paging_log_dirty_op, domain_pause is called, which kicks vcpus 
(that are in guest mode) out of guest mode via sending IPI, which cause VMEXIT, 
to them.

This also makes log-dirty radix tree more updated as PML buffer is flushed on 
basis of all VMEXITs but not only PML buffer full VMEXIT.

- Video RAM tracking (and partial dirty logging for guest memory range)

Video RAM is in dirty logging mode unconditionally during guest's run-time, and 
it is partial memory range of the guest. However, PML operates on the whole 
guest memory (the whole valid EPT table, more precisely), so we need to choose 
whether to use PML if only partial guest memory ranges are in dirty logging 

Currently, PML will be used as long as there's guest memory in dirty logging 
mode, no matter globally or partially. And in case of partial dirty logging, we 
need to check if the logged GPA in PML buffer is in dirty logging range.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.