[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]



Hi Ian:

On 05/03/2014 12:08 AM, Ian Jackson wrote:
Shriram Rajagopalan writes ("Re: [PATCH 07/10 V7] libxl: use the API to setup/teardown network buffering [and 1 more messages]"):
On Wed, Apr 23, 2014 at 11:02 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
wrote:
    I think you have basically two options:

    1. Make the remus part of this be a fully self-contained standard
       asynchronous callback-based suboperation, like libxl__xswait,
       libxl__bootloader, et al.

       In this case you should rigorously follow the existing patterns,
       defining a clear interface between the two parts, providing a
       callback function set by the caller, etc.

    2. Integrate the remus part into the suspend/resume code in an
       ad hoc fashion, with extremely clear comments everywhere about the
       expected interface, and no extraneous moving parts.

    At the moment you seem to have mixed these two approaches.

Sorry, I missed the previous comment of yours. The two options you
note are bit more clearer than the previous comment. And I also
agree that the current approach is mixing options 1 & 2.
Right.

The entire Remus code (executed from start to end) is one giant
async op.  Internally, per checkpoint, the code executes for no more
than tens of milliseconds at max, with the exception of sleeping
until the next checkpoint needs to be taken.
Yes.

Doing checkpoint related work (i.e., syscalls to control
disk/network buffers) in an async op is an overkill. So, they are
integrated into the suspend/resume infrastructure (option 2)

The async op is useful (please correct me if I am wrong) if the op
runs for a long time, such that you don't want users of libxl to
block. Which is exactly why the setup/teardown and the
sleep_until_next_checkpoint operations are ao suboperations. (option
1).
This isn't quite the right distinction.  You should use an
asynchronous style for anything which might block.  So it is OK to
synchronously make fast syscalls.  (I assume that the disk/network
control syscalls are fast.)

And in this context, blocking includes child processes, and calls to
(u)sleep.

All said, perhaps, it may be more clear to add a level of indirection:
 make domain_suspend_done a callback field in libxl__domain_state.

Change all direct callers of domain_suspend_done to invoke this callback.

In this way,
* domain_suspend -> domain_suspend_done (for normal suspend/save operations)
* domain_remus_start->domain_remus_terminated

What do you think?
I'm not sure I quite follow but I think you are still mixing the two
approaches I discuss above.  I would like you to pick one.

If you're not sure, you should probably pick option 1.

This patchset is based on the current remus implementation (without netbuffer)
witch is integrated into suspend/resume code, if as you suggested we pick option 1,
the whole remus structure needs refactoring.
we're working on it, may take quiet a while.


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.