[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Understanding Xen Log Output: Downtime and Total migration time.



On Mon, 2012-03-19 at 12:21 +0000, Phillip Nkubito wrote:
> Hi,
> 
> I am looking at the output of my xen logs (using xm log), particularly the
> outputs pertaining to the migration of VMS.
> 
> Here is an excerpt from the source host's logs(dom0); I am showing the
> first 3 iterations of the VM live migration.
> 
> ===========================================================
> 1:  [2012-03-09 13:37:12 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 1
> 2:   1: sent 28847, skipped 3921, delta 1702ms, dom0 62%, target 32%, sent
> 555Mb/s, dirtied 116Mb/s 6036 pages
> 
> 3:  [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 2
> 4:   2: sent 3462, skipped 2574, delta 201ms, dom0 60%, target 44%, sent
> 564Mb/s, dirtied 613Mb/s 3766 pages
> 
> 5:   [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 3
> 6:    3: sent 2701, skipped 1065, delta 178ms, dom0 48%, target 46%, sent
> 497Mb/s, dirtied 634Mb/s 3447 pages
> ===========================================================
> 
> It continues iterating as expected from the pre-copy algorithm until
> eventually the VM is migrated to the destination host.
> 
> My question is what is "delta" as it appears in "line 2,4 and 6" above
> supposed to be in terms of the migration time?

I'm inferring from the code (tools/libxc/xc_domain_save.c, print_stats
in particular) that it is the total time taken for the iteration. This
is consistent with the first being longer since it needs to send the
whole image. Subsequent rounds only send what has changed.

> 
> 
> For my next question I am showing the last iterations and some logs xen
> spews out:
> ===========================================================
> 1:  [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 12
> 2:   12: sent 341, skipped 0, delta 12ms, dom0 100%, target 0%, sent
> 931Mb/s, dirtied 114Mb/s 42 pages
> 3:  [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 13
> 4:   13: sent 42, skipped 0, Start last iteration
> 5:  [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:449) suspend
> 6:  [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:178) In
> saveInputHandler suspend
> 7:  [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:180) Suspending
> 4 ...
> 8:  [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG
> (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch
> 9:  [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG
> (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch
> 10: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] INFO
> (XendDomainInfo:1292) Domain has shutdown: name=migrating-vm2 id=4
> reason=suspend.
> 11: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:185) Domain 4
> suspended.
> 12: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) SUSPEND
> shinfo 00000bdc
> 13: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) delta 11ms,
> dom0 81%, target 0%, sent 125Mb/s, dirtied 199Mb/s 67 pages
> 14: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving
> memory pages: iter 14
> 15:  14: sent 67, skipped 0, delta 5ms, dom0 100%, target 0%, sent
> 439Mb/s, dirtied 439Mb/s 67 pages
> 16:  [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Total pages
> sent= 52369 (0.77x)
> ===========================================================
> 
> I understand from the pre-copy algorithm that the last stage involves
> stopping the VM and copying the remaining memory pages across to the
> destination host.
> 
> From the above excerpt on "line 15" is delta a reflection of the downtime
> during this last stop-copy phase?

It (again from code inspection only) counts the final iteration
including the transfer of the pages but does not count the "tail"
portion of the migrate (e.g. the domain state, which is small compared
with the RAM data), nor does it include waiting for the other end to
actually start the domain running at the far end.

I don't know of any stat which accurately tracks the precise downtime of
the stop and copy. One approach you could use would be to monitor the VM
other the network and measure the apparent network outage, since I guess
that is likely to be the thing which is most interesting in terms of
downtime. Otherwise I think you'd be looking at modifying the code to
track this statistic.

Ian.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.