[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"): > On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> > wrote: > Which of the xc_domain_save (and _restore) callbacks are called each > remus iteration ? > > Almost all of them on the xc_domain_save side. (suspend, resume, > save_qemu state, checkpoint). Right. > xc_domain_restore doesn't have any > callbacks AFAIK. And remus as of now does not have a component on > the restore side. It piggybacks on live migration's restore > framework. But the libxl memory management in the restore code is currently written to assume a finite lifetime for the ao. So I think this needs to be improved. Perhaps all the suspend/restore callbacks should each get one of the nested ao type things that Roger needs for his driver domain daemon. > FWIW, the remus related code that executes per iteration does not > allocate anything. All allocations happen only during setup and I > was under the impression that no other allocations are taking place > everytime xc_domain_save calls back into libxl. If this is true, then good, because we don't need to do anything, but there is a lot of code there and I would want to check. > However, it may be possible that other parts of the AO machinery > (and there are a lot of them) are allocating stuff per > iteration. And if that is the case, it could easily lead to OOMs > since Remus technically runs as long as the domain lives. The ao and event machinery doesn't do much allocation itself. > Having said that, libxl is not performance-optimised. Indeed the > callback mechanism involves context switching, and IPC, between the > save/restore helper and libxl proper. Probably not too much to be > doing every 20ms for a single domain, but if you have a lot of these > it's going to end up taking a lot of dom0 cpu etc. > > Yes and that is a problem. Xend+Remus avoided this by linking > the libcheckpoint library that interfaced with both the python & libxc code. Have you observed whether the performance is acceptable with your V3 patches ? > I assume you're not doing this for HVM domains, which involve saving > the qemu state each time too. ... > It includes HVM domains too. Although in that case, xenstore based suspend > takes about 5ms. So the checkpoint interval is typically 50ms or so. Right. > If there is a latency sensitive task running inside > the VM, lower checkpoint interval leads to better performance. Yes. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |