[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 00/21] libxl: domain save/restore: run in a separate process



On Thu, Jun 28, 2012 at 10:24 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Ian Campbell writes ("Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process"):
> Does this mean this series is now ready to go in?

I think so.  I'm just giving Shriram a chance to object.


I have no objections. I just finished testing the series with
xm (to ensure xend/remus was not broken). xl remus also fails over properly.
Things are good on that front.

But for the test case where I kill the backup (even with remote host replication), xl still crashes
the primary. [xend works properly in this case]. xl error output is at the end of the mail.

Either way, I have no objections to this series.
Tested-by: Shriram Rajagopalan <rshriram@xxxxxxxxx>


There is a more pressing matter that I just noticed.
The performance is extremely abysmal! Especially with xl.
Here is a comparative analysis between xend/remus and xl/remus 
for a PV domU w/ and w/o suspend-event channel.

What is being measured ? time to suspend + time to resume
 I am primarily concerned with the time to suspend and time to resume. 
With event channel, this should be on the order of 1ms or so. With xenstore, 
I would expect this to be max 5-7ms.
NB: This does not include the memcpy phase in xc_domain_save

Results: 32-bit PV domU w/ suspend event channel (2.6.32.2 xenolinux kernel)
xl-remus: ~1ms
xend-remus: ~1ms

So, for guests with suspend event channel support, 
remus with xl/xend has the same suspend/resume overhead.

64-bit PV domU w/o suspend event channel (3.3.0 upstream kernel)
xl-remus: ~202ms !!!
xend-remus: ~2.2ms

This 202ms figure is same in both IanJ's tree and the baseline xen-unstable.

Looking back at the logs, this has been the same since January. 
Is there some fixed timeout lurking in the code somewhere ?


====
xl error output, when killing backup VM (it crashes primary VM instead of resuming
it properly)
libxl: error: libxl_create.c:760:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:844:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl.c:1220:libxl_domain_destroy: non-existant domain 24
libxl: error: libxl_create.c:995:domcreate_complete: unable to destroy domain 24 following failed creation
migration target: Domain creation failed (code -3).
pagetables=2,cache_misses=0,emptypages=41
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 3 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 3 save/restore helper [5620] died due to fatal signal Broken pipe
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.


thanks
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.