[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Design session notes: guest unaware migration
Notes from the design session on guest unaware migration: - Guest kernel crashes (relatively rare). Often detectable by the toolstack and thus reported to the admin, distros generally take patches quickly. - Guest userspace issues (more common). Primarily seen around networking - e.g. iptables rules get cleaned up, and not re-injected. This can break e.g. Kubernetes networking. Some other examples around clustered services (though not clear if this is the guest being aware of the migration or just a result of the downtime). Generally impossible for the toolstack to detect, so admin normally unaware until users/monitoring complains. It was also mentioned that NetBSD has issues with live migration around suspend of the network interface. Possible solutions 1. Do the migration in a way that the guest is entirely unaware of it Amazon produced a proposal for this non-cooperative migration: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/designs/non-cooperative-migration.md;hb=HEAD Believed to be some older patch series on this Some notes from VM forking work that might be relevant: Some state was not saved as part of regular VM save, so resuming VM didn't work in some cases - likely will need to save this state if doing non-cooperative migration Dumping / restoring qemu state worked for Windows, but for Linux needed a save, fork, restore, so appears to be some sort of dependency there There is an issue around domids - in the proposal these are randomised, but that still means certain destinations aren't possible (in Amazon's case they just find a compatible target, but this is not necessarily an option in server virt scenarios where the admin specifies where they want the VM migrated to). The domid is a 15bit integer, so if you have < 32k VMs you could allocate centrally across a pool of servers. Could use non-cooperative migration where possible, but not expect it to work everywhere (e.g. within a pool, but not cross-pool in a XenServer example). Alternative idea from Alejandro - could VMs be faked to always think they always have a fixed domid (e.g. 1), then have dom0 know the actual one, with e.g. xenstore translating? Suggestion to talk to Juergen, he may have thoughts on this. Could we use a UUID instead of domid in the protocols? Large string/value that would be in lots of xenstore messages, could that cause problems. Does a VM need to know its domid (e.g. for giving to other guests to set up grants), or could it be hidden? Is this too much of a hack? If the guest is unaware, we still need to make sure the gratuitous ARP gets sent after migration. There are other use cases for non-cooperative migration, which would require not having anything custom in the VM. 2. Can we modify netfront so we don't generate the events (link down / interface removed - not clear which?) across a migration, thus userspace isn't aware even if the kernel is? Likely needs some code inspection to understand what's actually happening here as to any potential improvements.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |