[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Buggy interaction of live migration and p2m updates
On Thu, 2014-11-20 at 18:28 +0000, Andrew Cooper wrote: > Realistically, this means no updates to the > p2m at all, due to several potential race conditions. From the rest of the mail it seems as if you are talking primarily about changes to the p2m *structure*, i.e. which guest frames contain the p2m pages, rather than changes to the p2m entries themselves. Is that correct? I don't see any (explicit) mention of the pfn_to_mfn_frame_list_list here, where does that fit in? > As far as these issues are concerned, there are two distinct p2m > modifications which we care about: > 1) p2m structure changes (rearranging the layout of the p2m) > 2) p2m content changes (altering entries in the p2m) > > There is no possible way for the toolstack to prevent a domain from > altering its p2m. At the moment, ballooning typically only occurs when > requested by the toolstack, but the underlying operations > (increase/decrease_reservation, mem_exchange, etc) can be used by the > guest at any point. This includes Wei's guest memory fragmentation > changes. Changes to the content of the p2m also occur for grant map and > unmap operations. > > > Currently in PV guests, the p2m is implemented using a 3-level tree, > with its root in the guests shared_info page. It provides a hard VM > memory limit of 4TB for 32bit PV guests (which is far higher than the > 128GB limit from the compat p2m mappings), or 512GB for 64bit PV guests. > > Juergen has a proposed new p2m interface using a virtual linear > mapping. This is conceptually similar to the previous implementation > (which is fine from the toolstacks point of view), but far less > complicated from the guests point of view, and removes the memory limits > imposed by the p2m structure. > > The new virtual linear mapping suffers from the same interaction issues > as the old 3-level tree did, but the introduction of the new interface > affords us an opportunity to make all API modifications at once to > reduce churn. > > > During live migration, the toolstack maps the guests p2m into a linear > mapping in the toolstacks virtual address space. This is done once at > the start of migration, and never subsequently altered. During live > migration, the p2m is cross-verified with the m2p, and frames are sent > using pfns as a reference, as they will be located in different frames > on the receiving side. > > Should the guest change the p2m structure during live migration, the > toolstack ends up with a stale p2m with a non-p2m frame in the middle, > resulting in bogus cross-referencing. Should the guest change an entry > in the p2m, the p2m frame itself will be resent as it would be marked as > dirty in the logdirty bitmap, but the target pfn will remain unsent and > probably stale on the receiving side. > > > Another factor which needs to be taken into account is Remus/COLO, which > run the domains under live migration conditions for the duration of > their lifetime. > > During the live part of migration, the toolstack already has to be able > to tolerate failures to normalise the pagetables, which result as a > consequent of the pagetables being in active. These failures are fatal > on the final iteration after the guest has been paused, but the same > logic could be extended to p2m/m2p issues, if needed. > > > There are several potential solutions to these problems. > > 1) Freeze the guests p2m during live migrate > > This is the simplest sounding option, but is quite problematic from the > point of view of the guest. It is essentially a shared spinlock between > the toolstack and the guest kernel. It would prevent any grant > map/unmap operations from occurring, and might interact badly with > certain p2m updated in the guest which would previously be expected to > unconditionally succeed. > > Pros) (Can't think of any) > Cons) Not easy to implement (even conceptually), requires invasive guest > changes, will cripple Remus/COLO > > > 2) Deep p2m dirty tracking > > In the case that a p2m frame is discovered dirty in the logdirty bitmap, > we can be certain that a write has occurred to it, and in the common > case, means that the mapping has changed. The toolstack could maintain > a non-live copy of the p2m which is updated as new frames are sent. > When a dirty p2m frame is found, the live and non-live copies can be > consulted to find which pfn mappings have changed, and locally mark all > the altered pfns for retransmit. > > Pros) No guest changes required > Cons) Toolstack needs to keep an additional copy of the guests p2m on > the sending side > > 3) Eagerly check for p2m structure changes. > > p2m structure changes are rare after boot, but not impossible. Each > iteration of live migration, the toolstack can check for dirty > higher-level p2m frames in the dirty bitmap. In the case that a > structure update occurs, the toolstack can use information it already > has to calculate a subset of pfns affected by the update, and mark them > for resending. (This can currently be done to the frame granularity > given the p2m frame lit, but in combination with 2), could result in > fewer pfns needing resending.) > > Pros) No guest changes required. > Cons) Moderately high toolstack overhead, Possibility to resend far > more pfns than strictly required. > > 4) Request p2m structure change updates from the guest > > The guest could provide a "p2m generation count" to allow the toolstack > to evaluate whether the structure had changed. This would allow the > live part of migration to periodically re-evaluate whether it should > remap the p2m to avoid stale mappings. > > Pros) Easy to implement alongside the virtual linear mapping support. > Easy for toolstack and guest > Cons) Only works with new virtual linear guests. > > > Proposed solution: A combination of 2, 3 and 4. > > For legacy 3-level p2m guests, the toolstack can detect p2m structure > updates by tracking the p2m top and mid levels in the logdirty bitmap, > and invalidating the modified subset of pfns. It has to eagerly check > the p2m frame list list mfn entry in the shared info to see whether the > guest has swapped onto a completely new p2m. > > For a virtual linear map, the intermediate levels are not available to > track, but we can require that the guest increment p2m generation clock > in the shared info. When the structure changes, the toolstack can remap > the p2m and calculate the altered subset of pfns, and mark for resend. > > The toolstack must also track changes in the p2m itself, and compare to > a local copy showing the mapping at the time at which the pfn was last > sent. This can be used to work out which p2m mappings have changed, and > also be used to confirm whether the pfns on the receiving side are stale > or not. > > I believe this covered all cases and race conditions. In the case that > the p2m is updated before the m2p, the p2m frame will be marked dirty in > the bitmap, and discoverable on the next iteration. At that point, if > the p2m and m2p are inconsistent, the pfn will be deferred until the > final iteration. If not, the frame is sent and everything is all ok. > In the case that the p2m is updated after the m2p, the p2m/m2p will be > consistent when the dirty bitmap is acted on. > > > Thoughts? (for anyone who has made it this far :) I think I have > covered everything.) > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |