Xen project Mailing List

Re: [Xen-devel] PML (Page Modification Logging) design for Xen

From: Kai Huang <kai.huang@xxxxxxxxxxxxxxx>

Date: Thu, 12 Feb 2015 13:16:00 +0800

Cc: andrew.cooper3@xxxxxxxxxx, kevin.tian@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx, keir@xxxxxxx, tim@xxxxxxx

Delivery-date: Thu, 12 Feb 2015 05:24:42 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 02/12/2015 10:49 AM, Kai Huang wrote:


On 02/11/2015 09:06 PM, Jan Beulich wrote:

On 11.02.15 at 09:28, <kai.huang@xxxxxxxxxxxxxxx> wrote:
- PML enable/disable for particular Domain
PML needs to be enabled (allocate PML buffer, initialize PML index,PML baseaddress, turn PML on VMCS, etc) for all vcpus of the domain, as PMLbufferand PML index are per-vcpu, but EPT table may be shared by vcpus.EnablingPML on partial vcpus of the domain won't work. Also PML will only beenabledfor the domain when it is switched to dirty logging mode, and itwill bedisabled when domain is switched back to normal mode. As looks vcpunumberwon't be changed dynamically during guest is running (correct me ifI amwrong here), so we don't have to consider enabling PML for newcreated vcpu
when guest is in dirty logging mode.
After PML is enabled for the domain, we only need to clear EPTentry's D-bitfor guest memory in dirty logging mode. We achieve this by checkingif PML is
enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty, and
updating EPT entry accordingly. However, for super pages, we stillwriteprotect them in case of PML as we still need to split super page to4K page
in dirty logging mode.
While it doesn't matter much for our immediate needs, the
documentation isn't really clear about the behavior when a 2M or
1G page gets its D bit set: Wouldn't it be rather useful to the
consumer to know of that fact (e.g. by setting some of the lower
bits of the PML entry to indicate so)?

This is good point. The documentation only tells us the GPA will belogged with last 12 bits cleared. Whether hardware just clears last 12bits or performs 2M alignment (in case of 2M page) is not certain. Iwill confirm this with hardware guys. But as you said, it's notrelated to our immediate needs.

Forgot to say, to me currently it is certain that the lower 12 bits are

cleared as specification says GPA is written to log with 4K aligned. But

it should be possible to push hardware guys to modify if necessary,

though I am not 100% sure.

Thanks, -Kai

- PML buffer flush
There are two places we need to flush PML buffer. The first place isPML
buffer full VMEXIT handler (apparently), and the second place is in
paging_log_dirty_op (either peek or clean), as vcpus are running
asynchronously along with paging_log_dirty_op is called fromuserspace via
hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
buffers but not full. Therefore we'd better to flush all vcpus' PMLbuffers
before reporting dirty GPAs to userspace.
We handle above two cases by flushing PML buffer at the beginning ofallVMEXITs. This solves the first case above, and it also solves thesecondcase, as prior to paging_log_dirty_op, domain_pause is called, whichkicksvcpus (that are in guest mode) out of guest mode via sending IPI,which cause
VMEXIT, to them.
This also makes log-dirty radix tree more updated as PML buffer isflushed
on basis of all VMEXITs but not only PML buffer full VMEXIT.
Is that really efficient? Flushing the buffer only as needed doesn't
seem to be a major problem (apart from the usual preemption issue
when dealing with guests with very many vCPU-s, but you certainly
recall that at this point HVM is still limited to 128).

Apart from these two remarks, the design looks okay to me.

While keeping log-dirty radix tree more updated is probablyirrelevant, I do think we'd better to flush PML buffers inpaging_log_dirty_op (both peek and clear) before reporting dirty pagesto userspace, in which case I think flushing PML buffer at beginningof VMEXIT is a good idea, as domain_pause does the job automatically.I am not sure how much cycles will flushing PML buffer contribute butI think it should be relatively small comparing to VMEXIT itself,therefore it can be ignored.

An optimized way probably is we only flush PML buffer for externalinterrupt VMEXIT, which domain_pause really triggers, but not atbeginning of all VMEXITs. But as log as the overhead of flush PMLbuffer is negligible, this optimization is also unnecessary.


Thanks,
-Kai


Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.