On 15/07/15 08:45, Yang Hongyang wrote:
> In COLO mode, both VMs are running, and are considered in sync if the
> visible network traffic is identical.  After some time, they fall out of
> sync.
> At this point, the two VMs have definitely diverged.  Lets call the
> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> Sets A and B are different.
> Under normal migration, the page data for set A will be sent form the
> primary to the secondary.
> However, the set difference B - A (lets call this C) is out-of-date on
> the secondary (with respect to the primary) and will not be sent by the
> primary, as it was not memory dirtied by the primary.  The secondary
> needs the page data for C to reconstruct an exact copy of the primary at
> the checkpoint.
> The secondary cannot calculate C as it doesn't know A.  Instead, the
> secondary must send B to the primary, at which point the primary
> calculates the union of A and B (lets call this D) which is all the
> pages dirtied by both the primary and the secondary, and sends all page
> data covered by D.
> In the general case, D is a superset of both A and B.  Without the
> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> copy of the primary.
> We transfer the dirty bitmap on libxc side, so we need to introduce back
> channel to libxc.
> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> commit message:
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
>  tools/libxc/include/xenguest.h   |  8 ++++----
>  tools/libxc/xc_domain_restore.c  |  4 ++--
>  tools/libxc/xc_domain_save.c     |  4 ++--
>  tools/libxc/xc_sr_restore.c      |  2 +-
>  tools/libxc/xc_sr_save.c         |  2 +-
>  tools/libxl/libxl_save_callout.c | 39 ++++++++++++++++++++++++++-------------
>  tools/libxl/libxl_save_helper.c  |  8 ++++++--
>  7 files changed, 42 insertions(+), 25 deletions(-)

You have not patched xc_nomigrate.c, which means this will break the ARM
build.  (I fell into the same trap, requiring c/s f50fe3a5 as a fixup).

Having said that, I plan to throw together some cleanup patches removing
files like xc_domain_{save,restore}.c and dropping most of the
parameters from the parameter list, as they are superfluous.

I will try to get my cleanup done shortly, which should make this prereq
series easier, although I am focusing on some hypervisor side fixes
right at the moment.


