[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 COLOPre 16/26] tools/libx{l, c}: add back channel to libxc

On 07/01/2015 07:01 PM, Andrew Cooper wrote:
On 01/07/15 11:42, Ian Campbell wrote:
On Wed, 2015-07-01 at 10:38 +0800, Yang Hongyang wrote:
On 06/30/2015 06:10 PM, Ian Campbell wrote:
On Thu, 2015-06-25 at 14:25 +0800, Yang Hongyang wrote:
We need to send secondary's dirty page pfns back to primary.
In v2 Ian asked (<21888.2988.774072.32946@xxxxxxxxxxxxxxxxxxxxxxxx>):

          In the pdf
          linked from the wiki page
          it says that the secondary keeps a copy of the original contents of
          its dirty pages.  So I don't understand why you need to send the dirty
          bitmap to the primary.

Which I don't see an answer for in my archive. Have I missed (or
misplaced) the answer?
Sorry, seems that I misplaced the answer to:
[PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support
return data from xl to xc

    > Thanks for this.  I would have some comments on the details, but first
    > I want to properly understand your use case.  So while I'm the author
    > and maintainer of this save helper, I won't review this in detail just
    > yet.  I'm following the thread about what this is for...

      We need to send secondary's dirty page pfn back to primary. Primary will
      then send pages that are both dirtied on primary/secondary to secondary.
      in this way the secondary's memory will be consistent with primary.

      As we disscussed in [PATCH v2 COLOPre 04/13] tools/libxc: export 
      If we move this operation to libxc layer, this patch could be dropped.
This doesn't seem to be a response to Ian's question which I quoted

The crux of the question is that the design contained in those links
does not appear to require a back channel, because it does not require a
dirty bitmap to go from secondary to primary. Asserting a need to do so
does not answer the question.

It very definitely does require a dirty bitmap moving from the secondary
to the primary.

Lets see whether I can try explaining it in a different way.

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent form the
primary to the secondary.

However, the set difference B - A (lets call this C) is out-of-date on
the secondary (with respect to the primary) and will not be sent by the
primary, as it was not memory dirtied by the primary.  The secondary
needs the page data for C to reconstruct an exact copy of the primary at
the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

Thank you Andy! The explaination is clear enough, do you mind if I copy your
comments into the code comment or commit message and with your sob?


P.S. I have suggested an investigation of the CoW support in Xen as a
potential optimisation, as this could be used to prevent the secondary
losing C, but this is very definitely future work and not appropriate at
this point in COLO.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.