[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Migration memory corruption - PV backends need to quiesce
Hello, After a long time fixing my own memory corruption bugs with migration v2, I have finally tracked down (what I really really hope is) the last of the corruption. There appears to be a systematic problem affecting all PV drivers, whereby a non-quiescent backend can cause memory corruption in the VM. Active grant mapped pages are only reflected in the dirty bitmap after the grant has been unmapped, as mapping the ring read-only would be catastrophic to performance, and remapping as read-only when logdirty is enabled is (as far as I understand) impossible, as Xen doesn't track the PTEs pointing at granted frames. PV backend drivers hold their mappings of the rings (and persistently granted frames) open until the domain is destroyed, which is after the memory image has been sent. Therefore, any requests which are processed after the migration code sending the ring frame on its first pass will not be reflected in the resumed domain, as this frame will never be marked as dirty in Xen. Furthermore, as the migration code uses memcpy() on the frames, it is possible that a backed update intersects with the copy, and a corrupt descriptor appears on the resumed side. In addition, after the domain has been paused, the backend might still process requests. The migration code excepts the guest be completely quiesced after it has been suspended, so will only check the dirty bitmap once. Any requests which get processed and completed might still be missed by the migration code. From a heavily instrumented Xen and migration code, I am fairly sure I have confirmed that all pages corrupted on migration are a result of still-active grant maps, grant copies which complete after domain suspend, or the xenstore ring which xenstored has a magic mapping of, and will never be reflected in the dirty bitmap. Overall, it would appear that there needs to be a hook for all PV drivers to force quiescence. In particular, a backend must guarantee to unmap all active grant maps (so the frames get properly reflected in the dirty bitmap), and never process subsequent requests (so no new frames appear dirty in the bitmap after the guest has been paused). Thoughts/comments? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |