[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]



On Tue, Nov 12, 2013 at 9:38 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]"):
> On Mon, Nov 4, 2013 at 10:40 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
...
> The patch seems harmless enough. How do you want to go about this?

If you think this is the right approach, you should start on it right
away based on my patch.  The reason I posted it quickly was because I
wanted to unblock you.


The nested-ao patch makes sense for Remus, even without fixing this timeout issue.
I can modify my stuff accordingly. Probably create a nested-ao per iteration and drop
it at the start of the next iteration.

However, the timeout part is not convincing enough. For example, 
libxl__domain_suspend_common_callback [the version before your patch]
has two 6 second wait loops in the worst case..


  LOG(DEBUG, "issuing %s suspend request via XenBus control node",
          dss->hvm ? "PVHVM" : "PV");

  libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");

 LOG(DEBUG, "wait for the guest to acknowledge suspend request");
        watchdog = 60;
        while (!strcmp(state, "suspend") && watchdog > 0) {
            usleep(100000);

            state = libxl__domain_pvcontrol_read(gc, XBT_NULL, domid);
            if (!state) state = "";

            watchdog--;
        }

and then once again 

 LOG(DEBUG, "wait for the guest to suspend");
    watchdog = 60;
    while (watchdog > 0) {
        xc_domaininfo_t info;

        usleep(100000);
        ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);


Now I know where the 200ms overhead per checkpoint comes from.

Shouldn't this also be made into an event loop?  Irrespective of whether it is invoked in 
Remus' context or normal suspend/resume/save/restore/migrate context.

And if this remains, the Remus checkpoint interval is much lower compared to this.
Typically between 25-100ms.

 
> Do you want to post/commit this patch ? Because I have to modify my
> patches accordingly. Or should I post this along with my patch
> series, avoiding the need to wait on you before I post mine ?

You can certainly take it into the start of your patch series, yes.
Like what Roger did with the nested ao patch.

Currently there isn't any other reason to make the change in this
patch, so I don't think it should be committed right away.  But if for
some reason it does get committed to staging, you or we can just drop
it from the start of your series.


The only reason it might get committed to staging without other remus patches
would be to fix the issue I cited above.

cheers
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.