[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 3/5] remus: introduce remus device
On 06/06/2014 01:06 AM, Ian Jackson wrote: Yang Hongyang writes ("[PATCH v10 3/5] remus: introduce remus device"):introduce remus device, an abstract layer of remus devices(nic, disk, etc).It provide the following APIs for libxl:Thanks.>libxl__remus_device_setup setup remus devices, like attach qdisc, enable disk buffering, etc >libxl__remus_device_teardown teardown devices >libxl__remus_device_postsuspend >libxl__remus_device_preresume >libxl__remus_device_commit above three are for checkpoint.I started reviewing this patch by reading the commit message and the changes to libxl_internal.h. As far as I can tell what's going on, it mostly looks plausible. But the new parts of libxl_internal.h are missing important information about the new interfaces. I'd like the documentation in libxl_internal.h to be sufficient to read, understand, and check the code on one side of those interfaces, without having to go and read the code on the other side. I'll go into more detail about this below. Because of this, I haven't read the actual implementation code yet.through remus device layer, the remus execution flow will be like this: xl remus -> remus device setup |-> remus checkpoint(postsuspend, commit, preresume) ... |-> remus device teardown,failover or abortThis diagram could usefully be transferred into a comment in the code, probably in libxl_internal.h.the remus device layer provide an interface libxl__remus_device_ops which a remus device must implement.the whole remus structure: |remus| | |remus device| | |nic| |drbd disks| |qemu disks| ... a device(nic, drbd disks, qemu disks, etc) must implement libxl__remus_device_ops to support remus.Again, this diagram too.diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 2b46121..20601b2 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h...+/*----- remus device related state structure -----*/ +typedef enum libxl__remus_device_kind { + LIBXL__REMUS_DEVICE_NIC, + LIBXL__REMUS_DEVICE_DISK, +} libxl__remus_device_kind; + +typedef struct libxl__remus_state libxl__remus_state; +typedef struct libxl__remus_device libxl__remus_device; +typedef struct libxl__remus_device_state libxl__remus_device_state; +typedef struct libxl__remus_device_ops libxl__remus_device_ops;All fine.+struct libxl__remus_device_ops {Who produces and consumes this ? I _think_ from your diagram above that this is produced by a device type and consumed by the main remus code. Is that right ? I think this should be documented.+ /* + * init device ops private data, etc. must implenment + */What does "must implement" mean ? Do you mean that some of these function pointers can be 0 ? If so, you should say this explicitly somewhere and say what 0 means. (No-op?) Only the APIs for checkpoints may be 0, mean the op may not be implemented. Will document those. + int (*init)(libxl__remus_device_ops *self, + libxl__remus_state *rs); + /* + * free device ops private data, etc. must implenment + */ + void (*destroy)(libxl__remus_device_ops *self); + /* device ops's private data */ + void *data;In libxl we often embed structs inside other structs, rather than chaining them with void*s. Is tehre some reason why that's not a good idea here ? This is a device ops's private data, for different device types, the data structs are different, so we can not specify a specific data structure here. Also, I assume this should be "const void*", since I think this can only refer to static data ? Yes, the data was init by the init() api above. + /* + * checkpoint callbacks, async ops. may not implemented + */If these are async ops, what happens when they complete ? I see a libxl__remus_device_callback typedef below, and a callback field in libxl__remus_device. Is that it ? yes If so this should be written down.+ /* + * check whether device ops match the device, async op. must implement + */ + void (*match)(libxl__remus_device_ops *self, + libxl__remus_device *dev);I don't understand the purpose of this, but perhaps this is because I don't understand the lifecycle of a libxl__remus_device and what fields in it are for which bits of code. This API determine whether the ops match the specific device. In the implementation, we first init all device ops, for example, NIC ops, DRBD ops ... Then we will find out the libxl devices, and match the device with the ops, if the device is a drbd disk, then it will be matched with DRBD ops, and the further ops(such as checkpoint ops etc.) of this device will using DRBD ops. This API is mainly for disks, because we must use an external script to determine whether a libxl_disk is a DRBD disk. + /* + * setup the remus device, async op. must implement + */ + void (*setup)(libxl__remus_device *dev); + + /* + * teardown the remus device, async op. must implement + */ + void (*teardown)(libxl__remus_device *dev);When we say "setup" and "teardown" we refer to the actual device, not merely the libxl data structures, which are setup and torn down with "init" and "destroy" ? This could be clearer.+struct libxl__remus_device_state { + libxl__ao *ao; + libxl__egc *egc; + + /* devices that have been setuped */ + libxl__remus_device **dev; + + int num_nics; + int num_disks; + + /* for counting devices that have been handled */ + int num_devices; + /* for counting devices that matched and setuped */ + int num_setuped; +};We need to know who owns which fields in this structure. Or who is supposed to set them. Also, I'm not sure what this structure is for. It appears only to be in libxl__remus_state. What is the reason for it being separated out ?+struct libxl__remus_device {Again, we need to know who owns/sets which fields in this structure. (If the struct is shared between different layers of code, it is normally easiest to do this by reordering the fields into groups according to their ownership.) If the whole struct is owned by the same set of code, then we need to know which set of code that is. Perhaps by specifying a pattern on the function name (libxl__remus_device_*?)+ int devid; + /* libxl__device_* which this remus device related to */ + const void *backend_dev; + libxl__remus_device_kind kind; + int ops_index;What is ops_index ? This is for matching, we must go through all device ops until we find a matched op for the device. The ops_index record which ops we are matching. + libxl__remus_device_ops *ops;I think these ops structs are vtables so should be const.+ /* for calling scripts */ + libxl__async_exec_state aes;Conversely, I'm not sure that particular comments add anything. libxl__async_exec_state is always for executing scripts.+ /* for async func calls */ + libxl__ev_child child;Is this an ownership comment ? If so are you sure that the ownership isn't "this is owned by device-specific ops methods" ? (It is obvious that only an /asynchronous/ device-specific ops method would be able to make use of it.)+typedef void libxl__remus_callback(libxl__egc *, + libxl__remus_state *, int rc); + +struct libxl__remus_state { + libxl__ao *ao; + uint32_t domid; + libxl__remus_callback *callback;You should say that these must be set by the caller. (Looking at the API I assume that's the case.)+ /* private */ + int saved_rc; + /* context containing device related stuff */ + libxl__remus_device_state dev_state; + + libxl__ev_time timeout; /* used for checkpoint */ +}; + +_hidden void libxl__remus_device_setup(libxl__egc *egc, + libxl__remus_state *rs); +_hidden void libxl__remus_device_teardown(libxl__egc *egc, + libxl__remus_state *rs); +_hidden void libxl__remus_device_postsuspend(libxl__egc *egc, + libxl__remus_state *rs); +_hidden void libxl__remus_device_preresume(libxl__egc *egc, + libxl__remus_state *rs); +_hidden void libxl__remus_device_commit(libxl__egc *egc, + libxl__remus_state *rs); _hidden int libxl__netbuffer_enabled(libxl__gc *gc);These functions all call rs->callback() when done ? If so there should be a comment to say so.diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 52f1aa9..4278a6b 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [ (-12, "OSEVENT_REG_FAIL"), (-13, "BUFFERFULL"), (-14, "UNKNOWN_CHILD"), + (-15, "NOT_MATCH"), ], value_namespace = "")It is good that you introduce a new error code for your new error case. But I think it needs to have a better name. What does it mean ? It's for the match API, if the device ops not matched with the device, then match() will return the error code. If the match op encounted an error, then we will return ERROR_FAIL. In fact, I grepped the whole of this patch for NOT_MATCH and this new error status is checked for somewhere but never generated! We used the error code in device_match_cb(). If it forms a part of the new internal API then that should be documented.diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index 745e2ac..36bae04 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -24,7 +24,7 @@ our @msgs = ( 'unsigned long', 'done', 'unsigned long', 'total'] ], [ 3, 'scxA', "suspend", [] ], - [ 4, 'scxW', "postcopy", [] ], + [ 4, 'scxA', "postcopy", [] ], [ 5, 'scxA', "checkpoint", [] ], [ 6, 'scxA', "switch_qemu_logdirty", [qw(int domid unsigned enable)] ],I think this change (and its consequential changes to the handwritten parts) should be split out into a "no functional change" pre-patch. Thanks, Ian. . -- Thanks, Yang. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |