[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore



On Fri, 2015-05-15 at 17:34 +0800, Yang Hongyang wrote:
> 
> On 05/15/2015 05:27 PM, Ian Campbell wrote:
> > On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote:
> >>
> >> On 05/15/2015 05:09 PM, Ian Campbell wrote:
> >>> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote:
> >>>>
> >>>> On 05/14/2015 09:05 PM, Ian Campbell wrote:
> >>>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote:
> >>>>>> With Remus, the restore flow should be:
> >>>>>> the first full migration stream -> { periodically restore stream }
> >>>>>>
> >>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> >>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>>>> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> >>>>>> CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> >>>>>> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
> >>>>>> ---
> >>>>>>     tools/libxc/xc_sr_common.h  |  14 ++++++
> >>>>>>     tools/libxc/xc_sr_restore.c | 113 
> >>>>>> ++++++++++++++++++++++++++++++++++++++++----
> >>>>>>     2 files changed, 117 insertions(+), 10 deletions(-)
> >>>>>>
> >>>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> >>>>>> index f8121e7..3bf27f1 100644
> >>>>>> --- a/tools/libxc/xc_sr_common.h
> >>>>>> +++ b/tools/libxc/xc_sr_common.h
> >>>>>> @@ -208,6 +208,20 @@ struct xc_sr_context
> >>>>>>                 /* Plain VM, or checkpoints over time. */
> >>>>>>                 bool checkpointed;
> >>>>>>
> >>>>>> +            /* Currently buffering records between a checkpoint */
> >>>>>> +            bool buffer_all_records;
> >>>>>> +
> >>>>>> +/*
> >>>>>> + * With Remus, we buffer the records sent by the primary at 
> >>>>>> checkpoint,
> >>>>>> + * in case the primary will fail, we can recover from the last
> >>>>>> + * checkpoint state.
> >>>>>> + * This should be enough because primary only send dirty pages at
> >>>>>> + * checkpoint.
> >>>>>
> >>>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be
> >>>>> enough, unless there is something on the sending side arranging it to be
> >>>>> so?
> >>>>
> >>>> There are only few records at every checkpoint in my test, mostly under 
> >>>> 10,
> >>>> probably because I don't do much operations in the Guest. I thought This 
> >>>> limit
> >>>> can be adjusted later by further testing.
> >>>
> >>> For some reason I thought these buffers included the page data, is that
> >>> not true? I was expecting the bulk of the records to be dirty page data.
> >>
> >> The page data is not stored in this buffer, but it's pointer stored in
> >> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record.
> >
> > OK, so there are (approximately) as many xc_sr_records as there are
> > buffered dirty pages? I'd expect this would easily reach 1024 in some
> > circumstances (e..g run a fork bomb in the domain or something)
> 
> No, a record may contain up to 1024 pages, so the record number is less
> than dirty page number.

OK so 1024 records equates to ... <sounds of maths> .... around 4GB of
actual data at most (but I suppose not all recs will use the full 1024
pages).

I suppose a guest would be working quite hard to dirty that without
retriggering a checkpoint (even with colo's more relaxed approach to
resync).

In any case, making the array grow is clearly a good thing to do and
you've already done it, so no need to keep thinking about this case ;-)



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.