[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support
On Tue, Mar 28, 2017 at 03:41:02PM +0100, Wei Liu wrote: > Hi Harley, Chester and Joshua > > This is really nice work. I took a brief look at all the patches, they > look really high quality. Thank you! > > We're currently approaching freeze for a Xen release. We've got a lot on > our plate. I think maintainers will get to this series at some point. Understood. We're currently approaching our final exams so that's probably for the best :) > > From the look of things some patches can go in because they're general > useful. > > On Mon, Mar 27, 2017 at 05:06:12AM -0400, Joshua Otto wrote: > > Hi, > > > > We're a team of three fourth-year undergraduate software engineering > > students at > > the University of Waterloo in Canada. In late 2015 we posted on the list > > [1] to > > ask for a project to undertake for our program's capstone design project, > > and > > Andrew Cooper pointed us in the direction of the live migration > > implementation > > as an area that could use some attention. We were particularly interested > > in > > post-copy live migration (as evaluated by [2] and discussed on the list at > > [3]), > > and have been working on an implementation of this on-and-off since then. > > > > We now have a working implementation of this scheme, and are submitting it > > for > > comment. The changes are also available as the 'postcopy' branch of the > > GitHub > > repository at [4] > > > > As a brief overview of our approach: > > - We introduce a mechanism by which libxl can indicate to the libxc stream > > helper process that the iterative migration precopy loop should be > > terminated > > and postcopy should begin. > > - At this point, we suspend the domain, collect the final set of dirty pfns > > and > > write these pfns (and _not_ their contents) into the stream. > > - At the destination, the xc restore logic registers itself as a pager for > > the > > migrating domain, 'evicts' all of the pfns indicated by the sender as > > outstanding, and then resumes the domain at the destination. > > - As the domain executes, the migration sender continues to push the > > remaining > > oustanding pages to the receiver in the background. The receiver > > monitors both the stream for incoming page data and the paging ring event > > channel for page faults triggered by the guest. Page faults are > > forwarded on > > the back-channel migration stream to the migration sender, which > > prioritizes > > these pages for transmission. > > > > By leveraging the existing paging API, we are able to implement the postcopy > > scheme without any hypervisor modifications - all of our changes are > > confined to > > the userspace toolstack. However, we inherit from the paging API the > > requirement that the domains be HVM and that the host have HAP/EPT support. > > > > Please consider writing a design document for this feature and stick it > at the beginning of your series in the future. You can find examples > under docs/designs. Absolutely, I'll submit one with v2. > > The restriction is a bit unfortunate, but we shouldn't block useful work > because it's incomplete. We just need to make sure should someone decide > to implement similar functionality for PV guest, they should be able to > do so. > > You might want to check if shadow paging can be used with paging API, > such that you can widen the requirement to HVM guest support. > > > We haven't yet had the opportunity to perform a quantitative evaluation of > > the > > performance trade-offs between the traditional pre-copy and our post-copy > > strategies, but intend to. Informally, we've been testing our > > implementation by > > migrating a domain running the x86 memtest program (which is obviously a > > tremendously write-heavy workload), and have observed a substantial > > reduction in > > total time required for migration completion (at the expense of a visually > > obvious 'slowdown' in the execution of the program). We've also noticed > > that, > > when performing a postcopy without any leading precopy iterations, the time > > required at the destination to 'evict' all of the outstanding pages is > > substantial - possibly because there is no batching mechanism by which > > pages can > > be evicted - so this area in particular might require further attention. > > > > Please do post numbers when you have them. For now, please be patient > and wait for people to comment. Will do. As a general question for those following the thread, are there any application workloads/benchmarks that people would find particularly interesting? The experiment that we've planned but haven't had the time to follow through fully is to mount a ramdisk inside the guest and use Axboe's fio to test all of the entries in the (read/write mix) x (working set size) x (access pattern) matrix. Thank you again for your feedback! Josh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |