[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] remus: implement remus replicated checkpointing disk



On 03/10/2014 07:28 PM, Ian Jackson wrote:
> Lai Jiangshan writes ("[PATCH RFC] remus: implement remus replicated 
> checkpointing disk"):
>> This patch implements remus replicated checkpointing disk.
>> It includes two parts:
> ...
>> I request *comments* as many as possible.
> 
> Thanks for posting this so early.  It's very helpful to be able to
> review it before it's been polished.  Sorry it's taken a while to
> reply:
> 
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index a4ffdfd..858f5be 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1251,9 +1251,14 @@ static int libxl__remus_domain_suspend_callback(void 
>> *data)
> 
> These parts seem reasonable.
> 
>> +    rc = libxl__remus_disks_commit(remus_state);
>> +    if (rc) {
>> +        LOG(ERROR, "Failed to commit disks state"
>> +            " Terminating Remus..");
> 
> Why do we log a message hear but not in the other
> libxl__remus_disks_foo failure cases ?
> 
>> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
>> index cdc1c16..92eb36a 100644
>> --- a/tools/libxl/libxl_remus.c
>> +++ b/tools/libxl/libxl_remus.c
>> @@ -23,6 +23,7 @@ void libxl__remus_setup_initiate(libxl__egc *egc,
>>                                   libxl__domain_suspend_state *dss)
>>  {
>>      libxl__ev_time_init(&dss->remus_state->timeout);
>> +    libxl__remus_disks_setup(egc, dss);
> 
> I think this is going to have to be an asynchronous function (ie, use
> a callback style), as it's going to want to run scripts.  Likewise the
> teardown.
> 
>> +/*** drbd implementation ***/
>> +const int DRBD_SEND_CHECKPOINT = 20;
>> +const int DRBD_WAIT_CHECKPOINT_ACK = 30;
> 
> These should be "static" as well as "const".
> 
>> +typedef struct libxl__remus_drbd_disk
>> +{
> 
> Our coding style reserves "{" in the LH column for functions, so your
> struct definitions should have the "{" on the end of the previous
> line.  See libxl__device and libxl__ev_watch_slot for examples.
> 
>> +static int drbd_postsuspend(libxl__remus_disk *d)
>> +{
>> +    struct libxl__remus_drbd_disk *drbd = CONTAINER_OF(d, *drbd, 
>> remus_disk);
>> +
>> +    if (!drbd->ackwait) {
>> +        if (ioctl(drbd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
>> +            drbd->ackwait = 1;
> 
> This seems to make some assumption about return values, or lack of
> errors, or something.  I would expect to see some error handling
> here.
> 
>> +static int drbd_commit(libxl__remus_disk *d)
>> +{
>> +    /* nothing to do, all work are done by DRBD's protocal-D. */
>> +    return 0;
>> +}
> 
> I'm not sure I understand how this can be true.  Can you point me at
> an explanation of the supposed semantics of the remus disk commit ?
> (Eg in a remus design document or even a paper.)  I suspect something
> ought to be done here.


in drbd-remus case, DRBD_SEND_CHECKPOINT(drbd_postsuspend()) will
do the committing works asynchronously. 

tools/python/xen/remus/device.py:
    def commit(self):
        if not self.is_drbd:
            msg = os.read(self.msgfd.fileno(), 4)
            if msg != 'done':
                print 'Unknown message: %s' % msg

> 
>> +static libxl__remus_disk *drbd_setup(libxl__gc *gc, libxl_device_disk *disk)
> ...
>> +    drbd->ctl_fd = open(GCSPRINTF("/dev/drbd/by-res/%s", disk->pdev_path), 
>> O_RDONLY);
> 
> This line could do with wrapping.  And your error handling is a bit
> nugatory, I think - surely something should be logged here ?
> 
>> +static const libxl__remus_disk_type drbd_disk_type = {
>> +  .postsuspend = drbd_postsuspend,
>> +  .preresume = drbd_preresume,
>> +  .commit = drbd_commit,
>> +  .setup = drbd_setup,
>> +  .teardown = drbd_teardown,
>> +};
> 
> I like this vtable approach.
> 
>> +int libxl__remus_disks_postsuspend(libxl__remus_state *state)
>> +{
>> +    int i;
>> +    int rc = 0;
>> +
>> +    for (i = 0; rc == 0 && i < state->nr_disks; i++)
>> +        rc = state->disks[i]->type->postsuspend(state->disks[i]);
>> +
>> +    return rc;
>> +}
> 
> I think the error handling in these functions isn't correct.
> 
> Also, there are several almost-identical functions.  Can you consider
> whether you can write a macro to define them, or perhaps use offsetof
> to write a generic version of the function, or something ?
> 
>> +#if 0
>> +/* TODO: implement disk setup/teardown script */
>> +static void disk_exec_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
>> +                                      const struct timeval *requested_abs)
> 
> This will probably be easier after the refactoring needed to tease out
> the common script invocation code for the network buffering.
> 
>> +int libxl__remus_disks_setup(libxl__egc *egc, libxl__domain_suspend_state 
>> *dss)
>> +{
>> +    libxl__remus_state *remus_state = dss->remus_state;
>> +    int i, j, nr_disks;
>> +    libxl_device_disk *disks;
>> +    libxl__remus_disk *remus_disk;
>> +    const libxl__remus_disk_type *type;
>> +
>> +    STATE_AO_GC(dss->ao);
>> +    disks = libxl_device_disk_list(CTX, dss->domid, &nr_disks);
> 
> disks doesn't come from the gc, so you need to free it.  You should
> initialise it to 0 (NULL), and use the "goto out" error handling
> style.
> 
>> +    remus_state->nr_disks = nr_disks;
>> +    GCNEW_ARRAY(remus_state->disks, nr_disks);
>> +
>> +    for (i = 0; i < nr_disks; i++) {
>> +        remus_disk = NULL;
>> +        for (j = 0; j < ARRAY_SIZE(remus_disk_types); j++) {
>> +            type = remus_disk_types[j];
>> +            remus_disk = type->setup(gc, &disks[i]);
>> +            if (!remus_disk)
>> +                break;
>> +
>> +            remus_state->disks[i] = remus_disk;
>> +            remus_disk->disk = &disks[i];
>> +            remus_disk->type = type;
>> +        }
> 
> I think this code is wrong.  It appears to call all of the setup
> functions, not just one, and overwrite remus_disk with their
> successive results.

If the user use remus disk replication, it is required that
all the disks should support remus disk replication.

So we call setup to all disk. If any disk doesn't support remus
or any disk fail to setup, this libxl__remus_disks_setup() should failed too.

tools/python/xen/remus/device.py:
    def __init__(self, disk):
        if disk.uname.startswith('tap:remus:') or 
disk.uname.startswith('tap:tapdisk:remus:'):
            ...
        elif disk.uname.startswith('drbd:'):
        else:
            raise ReplicatedDiskException('Disk is not replicated: %s' %
                                        str(disk))


> 
>> +        if (!remus_disk) {
>> +            remus_state->nr_disks = i;
> 
> You may find this easier to write with the "goto found" / "found:"
> search loop idiom.  See "childproc_checkall" in libxl_fork.c for an
> example.
> 
> Thanks,
> Ian.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.