|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 6/7] remus: implement remus replicated checkpointing disk
> @@ -1463,7 +1468,10 @@ static int libxl__remus_domain_resume_callback(void
> *data)
> if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
> return 0;
>
> - /* REMUS TODO: Deal with disk. */
> + /* Deal with disk. */
> + if (libxl__remus_disk_preresume(dss->remus_state))
> + return 0;
> +
> return 1;
> }
>
Bug. I think I mentioned this last time. Disk needs to be resumed before the
domain is resumed. Just move the domain resume call below the above
code snippet.
> +typedef struct libxl__remus_disk_type {
> + /* checkpointing */
> + int (*postsuspend)(libxl__remus_disk *remus_disk);
> + int (*preresume)(libxl__remus_disk *remus_disk);
> + int (*commit)(libxl__remus_disk *remus_disk);
> +
> + /*
> + * Return value:
> + * 1: the disk is not this type or the script is still running
> + * 0: the disk is this type
> + * -1: error
> + */
> + int (*match)(libxl__domain_suspend_state *dss,
> + const libxl_device_disk *disk,
> + libxl_async_exec *async_exec,
> + void *disk_state);
> +
> + /*
> + * This is synchronous callback. Return value:
> + * 0: setup is done
> + * -1: error
> + *
> + */
> + int (*setup)(libxl__remus_disk *remus_disk);
> +
> + /*
> + * Return value:
> + * 1: the script is still running
> + * 0: the script is done
> + * -1: error
> + */
> + int (*teardown)(libxl__remus_disk *remus_disk,
> + libxl_async_exec *async_exec);
> +
> + /* the size of the private data */
> + int size;
> +} libxl__remus_disk_type;
> +
This vtable approach is neat. I am fine with the current disk
checkpoint approach you have taken.
Something that might be worth thinking about:
The old remus code used this approach for both the disk and network buffering.
Given that this code is going in a similar direction, I suggest
hoisting this structure
up to an abstract buffer type, with setup, teardown, postsuspend, preresume and
commit callbacks.
For disks, semantically,
setup [..]
teardown [..]
postsuspend [start flushing buffered writes to backup host]
preresume [wait until all writes have been flushed to backup host]
commit [no-op]
For network devices, semantically,
setup [..]
teardown [..]
postsuspend [no-op]
preresume [start_new_epoch - libnl call]
commit [release_prev_epoch - libnl call]
This way, in domain_suspend_done, the only thing we need to do is
foreach remus buffer
buffer.postsuspend()
Similarly, in resume_callback()
foreach remus buffer
buffer.preresume()
domain_resume()
in remus_checkpoint_dm_saved()
foreach remus buffer
buffer.commit()
Lai, I can take an crack at it if you would like.
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |