[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops



On 02/06/2010 16:46, "Andreas Olsowski" <andreas.olsowski@xxxxxxxxxxxxxxx>
wrote:

> One can see the timegap bewteen the first and the following memory batch
> reads.
> After that restoration works as expected.
> You might notice, that you have "0%" and then "100%" and no steps inbetween,
> whereas with xc_save you have, is that intentional or maybe another symptom
> for the same problem?

Does the log look similar for a restore on a fast system (except the
timestamps of course)?

> as for the read_exact stuff:
> tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H
> RDEXACT {} \;
> tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H
> rdexact {} \;
> 
> There are no RDEXACT/rdexact matches in my xen source code.

Ah, because you're using 4.0. Well, I wouldn't worry about it just now
anyway. It may be more fruitful to continue looking for a concrete
behavioural different between a fast and slow restore, apart from merely
timing, by inspecting logs.

 -- Keir

> In a few hours i will shutdown all virtual machines on one of the hosts
> experiencing slow xc_restores, maybe reboot it and check if xc_restore is any
> faster without load or utilization on the machine.
> 
> Ill check in with results later.
> 
> 
> On Wed, 2 Jun 2010 08:11:31 +0100
> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> 
>> Hi Andreas,
>> 
>> This is an interesting bug, to be sure. I think you need to modify the
>> restore code to get a better idea of what's going on. The file in the Xen
>> tree is tools/libxc/xc_domain_restore.c. You will see it contains many
>> DBGPRINTF and DPRINTF calls, some of which are commented out, and some of
>> which may 'log' at too low a priority level to make it to the log file. For
>> your purposes you might change them to ERROR calls as they will definitely
>> get properly logged. One area of possible concern is that our read function
>> (RDEXACT, which is a macro mapping to rdexact) was modified for Remus to
>> have a select() call with a timeout of 1000ms. Do I entirely trust it? Not
>> when we have the inexplicable behaviour that you're seeing. So you might try
>> mapping RDEXACT() to read_exact() instead (which is what we already do when
>> building for __MINIOS__).
>> 
>> This all assumes you know your way around C code at least a little bit.
>> 
>>  -- Keir
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.