With xen-unstable changset 10333:360f9dc71f51, live migration is not reliable. Migrating an active domain (I use a kernel build in my test) back and forth between two machines will result in the build or the domain crashing. I tweaked xc_linux_save.c to enable the verify pass without outputting all the debugging messages and I can see that one or two pages do not get a data match in the log.

I have yet to see a failure of the domain with non-live migration, but I sometimes see a data mismatch on a page during the verification. Which would indicate that either suspend doesn't mean what I think it does or pages of a suspended VM are being altered when they shouldn't be.

So, I guess I'll start with the easy question: should non-live migration ever have a page fail to verify? If not, how can I identify the source of the problem?

The harder question: how to identify the source of the corruption in live migration?


John Byrne

