[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk

To: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
From: Shriram Rajagopalan <rshriram@xxxxxxxxx>
Date: Thu, 3 Apr 2014 09:41:09 -0700
Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>, FNST-Wen Congyang <wency@xxxxxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jiang Yunhong <yunhong.jiang@xxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, "<xen-devel@xxxxxxxxxxxxx>" <xen-devel@xxxxxxxxxxxxx>, Dong Eddie <eddie.dong@xxxxxxxxx>, FNST-Yang Hongyang <yanghy@xxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
Delivery-date: Thu, 03 Apr 2014 16:41:57 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void 
> *data)
>     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>         return 0;
>
> -    /* REMUS TODO: Deal with disk. */
> +    /* Deal with disk. */
> +    if (libxl__remus_disk_preresume(dss->remus_state))
> +        return 0;
> +
>     return 1;
> }
>

Bug. I think I mentioned this last time. Disk needs to be resumed before the
domain is resumed. Just move the domain resume call below the above
code snippet.


> +typedef struct libxl__remus_disk_type {
> +    /* checkpointing */
> +    int (*postsuspend)(libxl__remus_disk *remus_disk);
> +    int (*preresume)(libxl__remus_disk *remus_disk);
> +    int (*commit)(libxl__remus_disk *remus_disk);
> +
> +    /*
> +     * Return value:
> +     *   1: the disk is not this type or the script is still running
> +     *   0: the disk is this type
> +     *  -1: error
> +     */
> +    int (*match)(libxl__domain_suspend_state *dss,
> +                 const libxl_device_disk *disk,
> +                 libxl_async_exec *async_exec,
> +                 void *disk_state);
> +
> +    /*
> +     * This is synchronous callback. Return value:
> +     *  0: setup is done
> +     * -1: error
> +     *
> +     */
> +    int (*setup)(libxl__remus_disk *remus_disk);
> +
> +    /*
> +     * Return value:
> +     *   1: the script is still running
> +     *   0: the script is done
> +     *  -1: error
> +     */
> +    int (*teardown)(libxl__remus_disk *remus_disk,
> +                    libxl_async_exec *async_exec);
> +
> +    /* the size of the private data */
> +    int size;
> +} libxl__remus_disk_type;
> +


This vtable approach is neat. I am fine with the current disk
checkpoint approach you have taken.

Something that might be worth thinking about:
The old remus code used this approach for both the disk and network buffering.
Given that this code is going in a similar direction, I suggest
hoisting this structure
up to an abstract buffer type, with setup, teardown, postsuspend, preresume and
commit callbacks.

For disks, semantically,
setup [..]
teardown [..]
postsuspend [start flushing buffered writes to backup host]
preresume [wait until all writes have been flushed to backup host]
commit  [no-op]

For network devices, semantically,
setup [..]
teardown [..]
postsuspend [no-op]
preresume [start_new_epoch - libnl call]
commit [release_prev_epoch - libnl call]

This way, in domain_suspend_done, the only thing we need to do is
foreach remus buffer
 buffer.postsuspend()

Similarly, in resume_callback()

foreach remus buffer
 buffer.preresume()
domain_resume()


in remus_checkpoint_dm_saved()
 foreach remus buffer
  buffer.commit()

Lai, I can take an crack at it if you would like.

shriram

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
  - From: Lai Jiangshan

References:
- [Xen-devel] [PATCH V8 8/8] libxl: network buffering cmdline switch
  - From: Yang Hongyang
- [Xen-devel] [PATCH 1/7] introduce a new function libxl__remus_netbuf_setup_done()
  - From: Lai Jiangshan
- [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
  - From: Lai Jiangshan

Prev by Date: [Xen-devel] 答复: 答复: Question: where can I check new features added at different Xen versions
Next by Date: [Xen-devel] How many testcases in osstest by default?
Previous by thread: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
Next by thread: Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.