[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH Remus v5 2/2] libxc/restore: implement Remus checkpointed restore
On Fri, 2015-05-15 at 17:34 +0800, Yang Hongyang wrote: > > On 05/15/2015 05:27 PM, Ian Campbell wrote: > > On Fri, 2015-05-15 at 17:19 +0800, Yang Hongyang wrote: > >> > >> On 05/15/2015 05:09 PM, Ian Campbell wrote: > >>> On Fri, 2015-05-15 at 09:32 +0800, Yang Hongyang wrote: > >>>> > >>>> On 05/14/2015 09:05 PM, Ian Campbell wrote: > >>>>> On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote: > >>>>>> With Remus, the restore flow should be: > >>>>>> the first full migration stream -> { periodically restore stream } > >>>>>> > >>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> > >>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >>>>>> CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx> > >>>>>> CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> > >>>>>> CC: Wei Liu <wei.liu2@xxxxxxxxxx> > >>>>>> --- > >>>>>> tools/libxc/xc_sr_common.h | 14 ++++++ > >>>>>> tools/libxc/xc_sr_restore.c | 113 > >>>>>> ++++++++++++++++++++++++++++++++++++++++---- > >>>>>> 2 files changed, 117 insertions(+), 10 deletions(-) > >>>>>> > >>>>>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h > >>>>>> index f8121e7..3bf27f1 100644 > >>>>>> --- a/tools/libxc/xc_sr_common.h > >>>>>> +++ b/tools/libxc/xc_sr_common.h > >>>>>> @@ -208,6 +208,20 @@ struct xc_sr_context > >>>>>> /* Plain VM, or checkpoints over time. */ > >>>>>> bool checkpointed; > >>>>>> > >>>>>> + /* Currently buffering records between a checkpoint */ > >>>>>> + bool buffer_all_records; > >>>>>> + > >>>>>> +/* > >>>>>> + * With Remus, we buffer the records sent by the primary at > >>>>>> checkpoint, > >>>>>> + * in case the primary will fail, we can recover from the last > >>>>>> + * checkpoint state. > >>>>>> + * This should be enough because primary only send dirty pages at > >>>>>> + * checkpoint. > >>>>> > >>>>> I'm not sure how it then follows that 1024 buffers is guaranteed to be > >>>>> enough, unless there is something on the sending side arranging it to be > >>>>> so? > >>>> > >>>> There are only few records at every checkpoint in my test, mostly under > >>>> 10, > >>>> probably because I don't do much operations in the Guest. I thought This > >>>> limit > >>>> can be adjusted later by further testing. > >>> > >>> For some reason I thought these buffers included the page data, is that > >>> not true? I was expecting the bulk of the records to be dirty page data. > >> > >> The page data is not stored in this buffer, but it's pointer stored in > >> this buffer(rec->data). This buffer is the bulk of the struct xc_sr_record. > > > > OK, so there are (approximately) as many xc_sr_records as there are > > buffered dirty pages? I'd expect this would easily reach 1024 in some > > circumstances (e..g run a fork bomb in the domain or something) > > No, a record may contain up to 1024 pages, so the record number is less > than dirty page number. OK so 1024 records equates to ... <sounds of maths> .... around 4GB of actual data at most (but I suppose not all recs will use the full 1024 pages). I suppose a guest would be working quite hard to dirty that without retriggering a checkpoint (even with colo's more relaxed approach to resync). In any case, making the array grow is clearly a good thing to do and you've already done it, so no need to keep thinking about this case ;-) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |