Xen project Mailing List

Re: How does shadow page table work during migration?

To: Kevin Negy <kevinnegy@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 19 Feb 2021 20:17:18 +0000

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3nb53cQ4PWXD69FuwHnuTzl6UylkUtvoW95r9GrLf4Q=; b=AxL1IgA0DvEb51WSFmU1Whe3oAgZlU5mmINBKs9ILJa3EMYnvyUG3XQ6kvfGLPse9+49ye+OVq8pgTpedRGNtCAEdSwBQR6ibmUegbrj0Oq9cS73kpXrVvg2NvvEpfb8B6IjNeTwQAxqsnX0NxXq0+z3Luu0vjURuYGybdpExO1iNucsxv5rR8/+4pj0uLxCTCAM2QDQp2xtx1xkQWEHY2hTKpouPTL0Os+T5y3hj2Jet56VT2eoEMMJYemCfDw1jNetNGSCqLtIFnPXwFq5TAPjE6KDylgTwiBMgm0jgk2hxmCruDWswpyi+RnibujPLratffoSPvjNRVbe5ZByNw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Jr83w5uYvuF8Q68kD+WyD5T2q0KXJ8Jsbrwdxm1x5D/8uO66ZokcMsIEcAdtSiqzS83M5XzpnSVmdjWkQt1xiBDhNlNsqAdPUvHrtg7DA9IiR203aPXKdKkWgUAMYPm1WGC+8M9UhDurEl9/lDptspP4SA6i+NQIVqNPc/eiXut/kP8qRGeHnOqnfcHfW6wrn90+7H9dTtTUkKscDqzBY+k/u7Ew/9Q5GpoHSjSfXbB3MIJe5Vl7J7TDElAzCXQzSbso550Qlu78/SllPYnrMFuQp74Fp3kcWTPtQynZLjcA55a0rIPeviDr0VHpmJnbVQI2cjTb+H3OXuZKFR73iQ==

Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Delivery-date: Fri, 19 Feb 2021 20:17:35 +0000

Ironport-sdr: 92RGrAAT3xB2WPHfPxUWVcYOQ3Tf6nDdVSClala0xhY14knOm86SUbMY4DWyZxW4FfxXLI8cst eLivrXFRp/+mqArP9p5l/YHcg07P5COwuKuOcn6ZPCyCfcepna+riH5+fAyyH5pu7vh/LSIHhn eoVgdXfME1oJzc+bW/LrTZTWE4OxGKO4c3f32e9k+EGWVSdqJUY6bkPW4eX5uleGl/LUNptRHH EW+1C5zd4H47WJ6D5fNDqDRaynTHSVMmo+HgVn3i2rO25I2vHfwk21/sLXHWlSjcaGgQytbj+x BHE=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19/02/2021 16:10, Kevin Negy wrote: > Hello, > > I'm trying to understand how the shadow page table works in Xen, > specifically during live migration. My understanding is that after > shadow paging is enabled (sh_enable_log_dirty() in > xen/arch/x86/mm/shadow/common.c), a shadow page table is created, > which is a complete copy of the current guest page table. Then the CR3 > register is switched to use this shadow page table as the active table > while the guest page table is stored elsewhere. The guest page table > itself (and not the individual entries in the page table) is marked as > read only so that any guest memory access that requires the page table > will result in a page fault. These page faults happen and are trapped > to the Xen hypervisor. Xen will then update the shadow page table to > match what the guest sees on its page tables. > > Is this understanding correct? > > If so, here is where I get confused. During the migration pre-copy > phase, each pre-copy iteration reads the dirty bitmap > (paging_log_dirty_op() in xen/arch/x86/mm/paging.c) and cleans it. > This process seems to destroy all the shadow page tables of the domain > with the call to shadow_blow_tables() in sh_clean_dirty_bitmap(). > > How is the dirty bitmap related to shadow page tables? Why destroy the > entire shadow page table if it is the only legitimate page table in > CR3 for the domain? Hello, Different types of domains use shadow pagetables in different ways, and the interaction with migration is also type-dependent. HVM guests use shadow (or HAP) as a fixed property from when they are created. Migrating an HVM domain does not dynamically affect whether shadow is active. PV guests do nothing by default, but do turn shadow on dynamically for migration purposes. Whenever shadow is active, guests do not have write access to their pagetables. All updates are emulated if necessary, and "the shadow pagetables" are managed entirely by Xen behind the scenes. Next, is the shadow memory pool. Guests can have an unbounded quantity of pagetables, and certain pagetable structures take more memory allocations to shadow correctly than the quantity of RAM expended by the guest constructing the structure in the first place. Obviously, Xen can't be in a position where it is forced to expend more memory for shadow pagetables than the RAM allocated to the guest in the first place. What we do is have a fixed sized memory pool (choosable when you create the domain - see the shadow_memory vm parameter) and recycle shadows on a least-recently-used basis. In practice, this means that Xen never has all of the guest pagetables shadowed at once. When a guest moves off the pagetables which are currently shadowed, a pagefault occurs and Xen shadows the new address by recycling a pagetable which hasn't been used for a while. The shadow_blow_tables() call is "please recycle everything" which is used to throw away all shadow pagetables, which in turn will cause the shadows to be recreated from scratch as the guest continues to run. Next, to the logdirty bitmap. The logdirty bitmap itself is fairly easy - it is one bit per 4k page (of guest physical address space) indicating whether that page has been written to, since the last time we checked. What is complicated is tracking writes, and understand why, it is actually easier to consider the HVM HAP (i.e. non-shadow) case. Here, we have a Xen-maintained single set of EPT or NPT pagetables, which map the guest physical address space. When we turn on logdirty, we pause the VM temporarily, and mark all guest RAM as read-only. (Actually, we have a lazy-propagation mechanism of this read-only-ness so we don't spend seconds of wallclock time with large VMs paused while we make this change.) Then, as the guest continues to execute, it exits to Xen when a write hits a read-only mapping. Xen responds by marking this frame in the logdirty bitmap, then remapping it as read-write, then letting the guest continue. Shadow pagetables are more complicated. With HAP, hardware helps us maintain the guest virtual and guest physical address spaces in logically separate ways, which eventually become combined in the TLBs. With Shadow, Xen has to do the combination of address spaces itself - the shadow pagetables map guest virtual to host physical address. Suddenly, "mark all guest RAM as read-write" isn't trivial. The logical operation you need is: for the shadows we have, uncombine the two logical addresses spaces, and for the subset which map guest RAM, change from read-write to read-only, then recombine. The uncombine part is actually racy, and involves reversing a one-way mapping, so is exceedingly expensive. It is *far* easier to just throw everything away and re-shadow from scratch, when we want to start tracking writes. Anyway - I hope this is informative. It is accurate to the best of my knowledge, but it also written off the top of my head. In some copious free time, I should see about putting some Sphinx docs together for it. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.