[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

On 11/05/15 07:28, Hongyang Yang wrote:
> On 05/09/2015 02:12 AM, Andrew Cooper wrote:
>> On 08/05/15 10:33, Yang Hongyang wrote:
>>> This patchset implement the Remus support for Migration v2 but without
>>> memory compressing.
>>> The series can be found on github:
>>> https://github.com/macrosheep/xen/tree/Remus-newmig-v2
>>> PATCH 1-7: Some refactor and prepare work.
>>> PATCH 8-9: The main Remus loop implement.
>>> PATCH 10: Fix for Remus.
>> I have reviewed the other half of the series now, and have some design
>> to discuss.  (I was hoping to get this email sent in reply to v1, but
>> never mind).  This largely concerns patch 7 and onwards.
>> Migration v2 has substantially more structure than legacy did.  Once
>> issue so far is that your series relies on using more than one END
>> record, which is not supported in the spec.  (Of course - the spec is
>> fine to be extended in forward-compatible ways.)
> I use END record as a info that indicate the end of the stream.

I suspected this, but is not a backwards compatible use of the migration
v2 stream.  There must only be a single END record, and it must be the
very last record the save side produces.

> I saw
> that you add a checkpoint record in your v2 series of Remus related
> patches,
> I can use that record to indicate the end of the checkpointed stream, but
> I think the record better to be called as end-of-checkpoint?

It is the logical end of the libxc bits of a checkpoint, but not of the
(soon to exist) libxl bits.

>> To fix the qemu layering issues I need to have some explicit negotiation
>> between libxc and libxl about sharing ownership of the input fd.  This
>> is going to require a new record in the format, and I currently drafting
>> a patch or two which should help in this regard.
>> My view for the eventual stream looks something like this (time going
>> downwards):
>> libxc writes:                   libxl writes:
>> Image Header
>> Domain Header
>> start_of_stream()
>> start_of_checkpoint()
> <live memory>
> ctx->save.callbacks->suspend()
> this callback suspend the primary guest and then calls Remus devices
> postsuspend callbacks to buffer the network pkts etc.

Sorry yes - I omitted this call in the example for brevity, but was not
intending to omit it from the code.

> <last iter of memory>
>> end_of_checkpoint()
>> Checkpoint record
> ctx->save.callbacks->postcopy()
> this callback should not be omitted, it do some necessary work before
> resume
> primary (such as call Remus devices preresume callbacks to ensure the
> disk
> data is consistent) and then resume the primary guest. I think this
> callback should be renamed to ctx->save.callbacks->resume().

That looks to be a useful cleanup (and answers one of my questions of
what exactly postcopy was)

>>              ctx->save.callbacks->checkpoint()
>>                                  libxl qemu record
> Maybe we should add another callback to send qemu record instead of
> using checkpoint callback. We can call it
> ctx->save.callbacks->save_qemu()

This is another layering violation.  libxc should not prescribe what
libxl might or might not do.  One example we are experimenting with in
XenServer at the moment is support for multiple emulators attached to a
single domain, which would necessitate two LIBXL_EMULATOR records to be
sent per checkpoint.  libxl might also want to send an updated json blob
or such.

> Then in checkpoint callback, we only call remus devices commit callbacks(
> which will release the network buffer etc...) then decide whether we
> need to
> do another checkpoint or quit checkpointed stream.
> With Remus, checkpoint callback only wait for 200ms(can be specified
> by -i)
> then return.
> With COLO, checkpoint callback will ask COLO proxy if we need to do a
> checkpoint, will return when COLO proxy module indicate a checkpoint
> is needed.

That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.

>>                                  ...
>>                                  libxl end-of-checkpoint record
>>              ctx->save.callbacks->checkpoint() returns
>> start_of_checkpoint()
> ctx->save.callbacks->suspend()
>> <memory>
>> end_of_checkpoint()
>> Checkpoint record
>> etc...
>> This will eventually allow both libxc and libxl to send checkpoint data
>> (and by the looks of it, remove the need for postcopy()).  With this
>> libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
>> current qemu situation, but I would prefer not to be also retrofitting
>> libxc checkpoint records when doing the libxl/migv2 work.
>> Does this look plausible in for Remus (and eventually COLO) support?
> With comments above, I would suggest the save flow as below:
> libxc writes:                   libxl writes:
> live migration:
> Image Header
> Domain Header
> start_of_stream()
> start_of_checkpoint()
> <live memory>
> ctx->save.callbacks->suspend()
> <last iter memory>
> end_of_checkpoint()
> if ( checkpointd )
>   End of Checkpoint record
>   /*If resotre side receives this record, input fd should be handed to
> libxl*/
> else
>   goto end
> loop of checkpointed stream:
> ctx->save.callbacks->resume()
> ctx->save.callbacks->save_qemu()
>                                 libxl qemu record
>                                 ...
>                                 libxl end-of-checkpoint record
> /*If resotre side receives this record, input fd should be handed to
> libxc*/
> ctx->save.callbacks->save_qemu() returns
> ctx->save.callbacks->checkpoint()
> start_of_checkpoint()
> ctx->save.callbacks->suspend()
> <memory>
> end_of_checkpoint()
> End of Checkpoint record
> goto 'loop of checkpointed stream'
> end:
> END record
> /*If resotre side receives this record, input fd should be handed to
> libxl*/
> In order to keep it simple, we can keep the current
> ctx->save.callbacks->checkpoint() as it is, which do the save_qemu
> thing, call
> Remus devices commit callbacks and then decide whether we need a
> checkpoint. We
> can also combine the ctx->save.callbacks->resume() with
> ctx->save.callbacks->checkpoint(), with only one checkpoint()
> callback, we do
> the following things:
>  - Call Remus devices preresume callbacks
>  - Resume the primary
>  - Save qemu records
>  - Call Remus devices commit callbacks
>  - Decide whether we need a checkpoint
> Overall, there are 3 options for the save flow:
> 1. keep the current callbacks, rename postcopy() to resume()
> 2. split the checkpoint() callback to save_qemu() and checkpoint()
> 3. combine the current postcopy() and checkpoint()
> Which one do you think is the best?

I have a 4th alternative in mind, but would like your feedback from my
comments in this email first.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.