[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] hellp!! live migration fails after several migrations - stress test



Hi folks,

I wanted to do a stress test on a domU (Debian 3.1, 2.6.11.12-xenU).
running on dom0 (Debian 3.1, 2.6.11.12-xen0), Xen-2.0.7, IA32.

Similar problems occured to me running Xen-devel from last week. Since it is 
under heavy development, I thought testing it with Xen-2.0.7 stable should 
work for sure.

Scenario:
2 of theses hosts (IPs: 192.168.112.17 and 192.168.112.18) are running a 
script. 
The script checks if the domain is on the host, if yes, it will be migrated to 
the other one, if no, the script will sleep for a few minutes.
It's kind of a domU-pingpong.

domU has a file-backed VBD lying on a 3rd host (NFS Server).

It worked for several migrations, but suddenly, the domU died.
Hopefully, someone could help me find out what the problem is. 

script pingpong:
----------------------------------------------------------------------------------
SCRIPTNAME=`basename $0`
HOSTNAME=`hostname`
LOGFILE=$SCRIPTNAME.$HOSTNAME.log
XENHOST1="192.168.112.17"
XENHOST2="192.168.112.18"
XENGUEST1="debian1"


# determine the xen target host
MYIP=`ifconfig eth0 | grep "inet" | awk '{print $2}' | cut -d: -f 2`
if [ $MYIP = $XENHOST1 ]
then
   TARGETHOST=$XENHOST2
else
   TARGETHOST=$XENHOST1
fi


log() {
   echo `date '+[%y/%m/%d %H:%M:%S]'` - $1 >> $LOGFILE
}


while [ 1 -eq 1 ]
do 
   RUNNINGGUESTS=`xm list | grep -v Name | grep -v Domain-0 | awk '{print 
$1}'`
   if `echo $RUNNINGGUESTS | grep $XENGUEST1 > /dev/null`
   then
      XENGUESTID=`xm list | grep $XENGUEST1 | awk '{print $2}'`
      log "domain $XENGUEST1 (ID=$XENGUESTID) is running here"
      if `xm list | grep $XENGUEST1 | awk '{print $5}' | grep p > /dev/null`
      then
         log "domain $XENGUEST1 is currently paused and hence cannot be 
migrated!"
      else
         log "START migration of $XENGUEST1 to $TARGETHOST..."
         if `xm migrate -l $XENGUEST1 $TARGETHOST`
         then
            log "END migration of domain $XENGUEST1 successfull!"
         else
            log "ERROR while migrating $XENGUEST1!"
         fi
      fi
   else
      log "domain $XENGUEST1 is not here. I'll go back to sleep."
   fi
   sleep `expr  $RANDOM % 1200 + 120`
done
-----------------------------------------------------------------------------------------

logfile snippets of the last migration (from 192.168.112.18 to 192.168.112.17:

192.168.112.17:
--------------------
xend.log:
------------
[2005-11-28 01:14:30 xend] DEBUG (blkif:203) Connecting blkif to event channel 
<BlkifBackendInterface 24 0> ports=16:4
[2005-11-28 01:14:30 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died 
name=debian1 id=24
[2005-11-28 01:14:30 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:665) Destroying vifs for 
domain 24
[2005-11-28 01:14:30 xend] DEBUG (netif:305) Destroying vif domain=24 vif=0
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:674) Destroying vbds for 
domain 24
[2005-11-28 01:14:30 xend] DEBUG (blkif:552) Destroying blkif domain=24
[2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=0
[2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=1
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
24
[2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:622) Closing channel to 
domain 24
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.virq 4
[2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop0
[2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop1
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.exit 
['debian1', '24', 'crash']
[2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '24']
[2005-11-28 01:14:31 xend] INFO (XendRoot:113) EVENT> xend.domain.died 
['debian1', '24']
---------------------
xfrd.log:
------------
3488 [INF] XFRD> Forked child pid=5047
3488 [INF] XFRD> Accepted connection from 192.168.112.18:1777 on 2
5069 [INF] XFRD> Xfr service for 192.168.112.18:1777
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr 16)[DEBUG] Conn_sxpr< err=0
[1133136756.095291] xc_linux_restore start

xc_linux_restore start
[1133136756.108828] Created domain 24

Created domain 24
(Domain-0 Domain-24)'domain id=24 name=debian1 memory=256 console=9624 
image=/boot/vmlinuz-2.6.11.12-xenU'[1133136756.570952] Reloading memory 
pages:   0%
Reloading memory pages:   6%                                                    
                                                                      
337%[1133136870.173691] Received all pages

Received all pages                                                              
                                                                      
100%
                                                                                
                                                                      
100%
[1133136870.183801] Memory reloaded.

Memory reloaded.
Decreased reservation by 7 pages
[1133136870.184758] Domain ready to be built.

Domain ready to be built.
[1133136870.184844] Domain ready to be unpaused

Domain ready to be unpaused
[1133136870.185234] DOM=24

DOM=24
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
5069 [INF] XFRD> Transfer complete in 114 seconds
5069 [INF] XFRD> Xfr service err=0

192.168.112.18:
----------------------------------------------
xend.log:
------------
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:487) domain_restart_schedule> 16 
suspend 1
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.shutdown 
['debian1', '16', 'suspend']
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died 
name=debian1 id=16
[2005-11-28 01:14:16 xend] DEBUG (XendDomain:247) XendDomain>reap> shutdown 
id=16 reason=suspend
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.virq 4
[2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.suspended 
['debian1', '16']
[2005-11-28 01:14:25 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:25 xend] DEBUG (XendDomainInfo:665) Destroying vifs for 
domain 16
[2005-11-28 01:14:25 xend] DEBUG (netif:305) Destroying vif domain=16 vif=0
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:674) Destroying vbds for 
domain 16
[2005-11-28 01:14:26 xend] DEBUG (blkif:552) Destroying blkif domain=16
[2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=0
[2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=1
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
16
[2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:622) Closing channel to 
domain 16
[2005-11-28 01:14:26 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '16']
[2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop0
[2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) 
from /dev/loop1
[2005-11-28 01:14:27 xend] INFO (XendDomain:568) Destroying domain: 
name=debian1
[2005-11-28 01:14:27 xend] DEBUG (XendDomainInfo:634) Closing console, domain 
16
[2005-11-28 01:14:27 xend] ERROR (XendDomainInfo:627) Domain destroy failed: 
debian1
Traceback (most recent call last):
  File 
"/usr/src/xen-2.0/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py", 
line 625, in destroy_domain
    return xc.domain_destroy(dom=self.dom)
error: (3, 'No such process')
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy 
['debian1', '16']
[2005-11-28 01:14:27 xend] INFO (XendMigrate:345) Migrate OK: ['migrate', 
['id', '16'], ['state', 'ok'], ['live', 1], ['resource', 0], ['src', ['host', 
'testpc-018'], ['domain', '16']], ['dst', ['host', '192.168.112.17'], 
['domain', 24]]]
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.died 
['debian1', '16']
[2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.migrate 
['debian1', '16', 'ok', ['migrate', ['id', '16'], ['state', 'ok'], ['live', 
1], ['resource', 0], ['src', ['host', 'testpc-018'], ['domain', '16']], 
['dst', ['host', '192.168.112.17'], ['domain', 24]]]]

xfrd.log:
------------
 1: sent 37192, skipped 28339, delta 27582ms, dom0 44%, target 4%, sent 
44Mb/s, dirtied 59Mb/s 49812 pages
[1133136779.371602] Saving memory pages: iter 2   0%
 2: sent 38822, skipped 10990, %                                                
                                                     
96%
 2: sent 38822, skipped 10990, delta 18590ms, dom0 33%, target 5%, sent 
68Mb/s, dirtied 50Mb/s 28781 pages
[1133136797.962478] Saving memory pages: iter 3   0%
 3: sent 26495, skipped 2285, 7%                                                
                                                     
92%
 3: sent 26495, skipped 2285, delta 9687ms, dom0 36%, target 8%, sent 89Mb/s, 
dirtied 62Mb/s 18477 pages
[1133136807.650291] Saving memory pages: iter 4   0%
 4: sent 16933, skipped 1544, 5%                                                
                                                     
96%
 4: sent 16933, skipped 1544, delta 6007ms, dom0 38%, target 9%, sent 92Mb/s, 
dirtied 63Mb/s 11599 pages
[1133136813.658026] Saving memory pages: iter 5   0%
 5: sent 10261, skipped 1338, 8%                                                
                                                     
99%
 5: sent 10261, skipped 1338, delta 3642ms, dom0 39%, target 9%, sent 92Mb/s, 
dirtied 72Mb/s 8112 pages
[1133136817.300454] Saving memory pages: iter 6   0%
 6: sent 6907, skipped 1205, 11%                                                
                                                     
87%
 6: sent 6907, skipped 1205, delta 2451ms, dom0 36%, target 17%, sent 92Mb/s, 
dirtied 82Mb/s 6143 pages
[1133136819.752019] Saving memory pages: iter 7   0%
 7: sent 5127, skipped 1011, 16%                                                
                                                     
99%
 7: sent 5127, skipped 1011, delta 1849ms, dom0 47%, target 13%, sent 90Mb/s, 
dirtied 173Mb/s 9795 pages
[1133136821.601429] Saving memory pages: iter 8   0%
 8: sent 8308, skipped 1487, 10%                                                
                                                     
98%
 8: sent 8308, skipped 1487, delta 4210ms, dom0 35%, target 7%, sent 64Mb/s, 
dirtied 86Mb/s 11162 pages
[1133136825.811665] Saving memory pages: iter 9   0%
 9: sent 9356, skipped 1805,  8%                                                
                                                     
98%
 9: sent 9356, skipped 1805, delta 4216ms, dom0 38%, target 18%, sent 72Mb/s, 
dirtied 98Mb/s 12663 pages
[1133136830.028020] Saving memory pages: iter 10   0%
 10: sent 11249, skipped 1410, 7%                                               
                                                     
90%
 10: sent 11249, skipped 1410, delta 8518ms, dom0 26%, target 5%, sent 43Mb/s, 
dirtied 74Mb/s 19253 pages
[1133136838.546941] Saving memory pages: iter 11   0%
 11: sent 17449, skipped 1803, 5%                                               
                                                     
99%
 11: sent 17449, skipped 1803, delta 10874ms, dom0 30%, target 4%, sent 
52Mb/s, dirtied 50Mb/s 16807 pages
[1133136849.421847] Saving memory pages: iter 12   0%
 12: sent 14596, skipped 2211, 6%                                               
                                                     
98%
 12: sent 14596, skipped 2211, [DEBUG] Conn_sxpr>
(debian1 16)[DEBUG] Conn_sxpr< err=0
[1133136856.172376] SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 
00035a84

SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 00035a84
delta 6750ms, dom0 39%, target 8%, sent 70Mb/s, dirtied 102Mb/s 21143 pages
[1133136856.172581] Saving memory pages: iter 13   0%
 13: sent 21143, skipped 0,    9%                                               
                                                     
97%
 13: sent 21143, skipped 0, delta 9709ms, dom0 22%, target 0%, sent 71Mb/s, 
dirtied 71Mb/s 21143 pages
[1133136865.882150] Total pages sent= 223838 (3.42x)

Total pages sent= 223838 (3.42x)
[1133136865.882212] (of which 0 were fixups)

(of which 0 were fixups)
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr.ok 24)[DEBUG] Conn_sxpr< err=0
3555 [INF] XFRD> Transfer complete in 115 seconds
3555 [WRN] XFRD> Transfer OK
3555 [INF] XFRD> Xfr service err=0



Michael

-- 
----------------------------------------------------------------------------------------
Michael Mey                                  
Thinking Objects Software GmbH    |   mailto: michael.mey@xxxxxx 
Lilienthalstrasse 2/1                         |   phone: +49 711 88770-147
70825 Stuttgart-Korntal, Germany  |   fax: +49 711 88770-449
----------------------------------------------------------------------------------------

Attachment: pgpEBH0gkO0To.pgp
Description: PGP signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.