[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages] [and 1 more messages]

Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network 
buffering in remus callbacks [and 1 more messages] [and 1 more messages]"):
> The nested-ao patch makes sense for Remus, even without fixing this
> timeout issue.  I can modify my stuff accordingly. Probably create a
> nested-ao per iteration and drop it at the start of the next
> iteration.

Right.  Great.

> However, the timeout part is not convincing enough. For example, 
> libxl__domain_suspend_common_callback [the version before your patch]
> has two 6 second wait loops in the worst case..
>  LOG(DEBUG, "wait for the guest to acknowledge suspend request");
>         watchdog = 60;
>         while (!strcmp(state, "suspend") && watchdog > 0) {
>             usleep(100000);
> and then once again 
>         usleep(100000);

Oh dear.  That is very poor.

> Now I know where the 200ms overhead per checkpoint comes from.
> Shouldn't this also be made into an event loop?  Irrespective of
> whether it is invoked in Remus' context or normal
> suspend/resume/save/restore/migrate context.

Yes, you are entitrely correct.

Both of these loops should be replaced with timeout/event/callback

Do you want to attempt this or would you like me to do it ?

>     Currently there isn't any other reason to make the change in this
>     patch, so I don't think it should be committed right away.  But if for
>     some reason it does get committed to staging, you or we can just drop
>     it from the start of your series.
> The only reason it might get committed to staging without other remus patches
> would be to fix the issue I cited above.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.