Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

From: Joshua Otto <jtotto@xxxxxxxxxxxx>

Date: Fri, 31 Mar 2017 00:51:46 -0400

Cc: ian.jackson@xxxxxxxxxxxxx, hjarmstr@xxxxxxxxxxxx, wei.liu2@xxxxxxxxxx, czylin@xxxxxxxxxxxx, imhy.yang@xxxxxxxxx

Delivery-date: Fri, 31 Mar 2017 04:52:23 +0000

Dkim-filter: OpenDKIM Filter v2.11.0 minos.uwaterloo.ca v2V4pk5r008320

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, Mar 29, 2017 at 11:50:52PM +0100, Andrew Cooper wrote: > On 27/03/2017 10:06, Joshua Otto wrote: > > Hi, > > > > We're a team of three fourth-year undergraduate software engineering > > students at > > the University of Waterloo in Canada. In late 2015 we posted on the list > > [1] to > > ask for a project to undertake for our program's capstone design project, > > and > > Andrew Cooper pointed us in the direction of the live migration > > implementation > > as an area that could use some attention. We were particularly interested > > in > > post-copy live migration (as evaluated by [2] and discussed on the list at > > [3]), > > and have been working on an implementation of this on-and-off since then. > > > > We now have a working implementation of this scheme, and are submitting it > > for > > comment. The changes are also available as the 'postcopy' branch of the > > GitHub > > repository at [4] > > > > As a brief overview of our approach: > > - We introduce a mechanism by which libxl can indicate to the libxc stream > > helper process that the iterative migration precopy loop should be > > terminated > > and postcopy should begin. > > - At this point, we suspend the domain, collect the final set of dirty pfns > > and > > write these pfns (and _not_ their contents) into the stream. > > - At the destination, the xc restore logic registers itself as a pager for > > the > > migrating domain, 'evicts' all of the pfns indicated by the sender as > > outstanding, and then resumes the domain at the destination. > > - As the domain executes, the migration sender continues to push the > > remaining > > oustanding pages to the receiver in the background. The receiver > > monitors both the stream for incoming page data and the paging ring event > > channel for page faults triggered by the guest. Page faults are > > forwarded on > > the back-channel migration stream to the migration sender, which > > prioritizes > > these pages for transmission. > > > > By leveraging the existing paging API, we are able to implement the postcopy > > scheme without any hypervisor modifications - all of our changes are > > confined to > > the userspace toolstack. However, we inherit from the paging API the > > requirement that the domains be HVM and that the host have HAP/EPT support. > > Wow. Considering that the paging API has had no in-tree consumers (and > its out-of-tree consumer folded), I am astounded that it hasn't bitrotten. Well, there's tools/xenpaging, which was a helpful reference when putting this together. The user-space pager actually has rotted a bit (I'm fairly certain the VM event ring protocol has changed subtly under its feet), so I also needed to consult tools/xen-access to get things right. > > > > > We haven't yet had the opportunity to perform a quantitative evaluation of > > the > > performance trade-offs between the traditional pre-copy and our post-copy > > strategies, but intend to. Informally, we've been testing our > > implementation by > > migrating a domain running the x86 memtest program (which is obviously a > > tremendously write-heavy workload), and have observed a substantial > > reduction in > > total time required for migration completion (at the expense of a visually > > obvious 'slowdown' in the execution of the program). > > Do you have any numbers, even for this informal testing? We have a much more ambitious test matrix planned, but sure, here's an early encouraging set of measurements - for a domain with 2GB of memory and a 256MB writable working set (the application driving the writes being fio submitting writes against a ramdisk), we measured these times: Pre-copy + Stop-and-copy | 1 precopy iteration + (s) | postcopy (s) --------------------------+------------------------- Precopy Duration: 66.97 | 44.44 Suspend Duration: 6.807 | 3.23 Postcopy Duration: N/A | 4.83 However... That 3.23s suspend for the hybrid migration seems too high, doesn't it? There's currently a serious performance bug that we're still trying to work out in the case of pure-postcopy migrations, with no leading precopy. Attempting a pure postcopy migration when running the experiment above yields: Pure postcopy (s) ---------------------- Precopy Duration: 0 Suspend Duration: 21.93 Postcopy Duration: 44.22 Although the postcopy scheme clearly works, it takes 21.93s (!) to unpause the guest at the destination. The eviction of the unmigrated pages completes in a second or two because of the lack of batching support (still bad, but not this bad) - the holdup is somewhere on the domain creation sequence between domcreate_stream_done() and domcreate_complete(). I suspect that this is the result of a bad interaction between QEMU's startup sequence (its foreign memory mapping behaviour in particular) and the postcopy paging. Specifically: the paging ring has room only for 8 requests at a time. When QEMU attempts to map a large range, the range gets postcopy-faulted over synchronously in batches of 8 pages at a time, and each such batch implies a synchronous copy of its pages over the network (and the 100us xenforeignmemory_map() retry timer) before the next batch can begin. If I am able to confirm that this is the case, a sensible solution would seem to be supporting paging range-population requests (i.e. a new paging ring request type for a _range_ of gfns). In the mean time, you should expect to observe this effect as well in experiments. It appears to be largely (but not completely) mitigated by performing a single pre-copy iteration first. > > > We've also noticed that, > > when performing a postcopy without any leading precopy iterations, the time > > required at the destination to 'evict' all of the outstanding pages is > > substantial - possibly because there is no batching mechanism by which > > pages can > > be evicted - so this area in particular might require further attention. > > > > We're really interested in any feedback you might have! > > Do you have a design document for this? The spec modifications and code > comments are great, but there is no substitute (as far as understanding > goes) for a description in terms of the algorithm and design choices. As I replied to Wei, not yet, but we'd happily prepare one for v2. Thanks! Josh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.