 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] "xm save" only works once...
 Hi,
I am using Xen-2.0.7 on a Dual Intel Xeon 2.8GHz system with 4GB of ram. I am 
using 2.6.11 as kernel for my domain 0. Domain 0 uses Debian Sarge with a 
backported Xen 2.0.7 package (only litte changes to the debian 2.0.6 package, 
nothing important enough to get metioned). All kernels were compiled against 
vanilla kernels with xen-patch. The domain U's are using 2.6.11 or 2.4.30 
(debian, suse).
I have no problems within domains and everything is running very smoothly, 
exepct one thing (which was also not working correctly in xen-2.0.6 for me):
I can save a domain with "xm save <domainname> <suspendfile>" once and I can 
restore this domain again, but if I try a second "xm save ..." it simply 
seems to hang. Nothing happens and the last thing in the logs are these 
lines:
==> /var/log/xend.log <==
[2005-08-15 20:12:27 xend] INFO (XendMigrate:380) Save BEGIN: ['save', ['id', 
'1'], ['state', 'begin'], ['domain', '5'], ['file', '/suspend/vm-ralph']]
[2005-08-15 20:12:27 xend] INFO (XendRoot:113) EVENT> xend.domain.save 
['vm-ralph', '5', 'begin', ['save', ['id', '1'], ['state', 'begin'], 
['domain', '5'], ['file', '/suspend/vm-ralph']]]
==> /var/log/xfrd.log <==
3808 [INF] XFRD> Accepted connection from 127.0.0.1:3905 on 2
4165 [INF] XFRD> Xfr service for 127.0.0.1:3905
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.save 5 "(domain (id 5) (name vm-ralph) (memory 127) (maxmem 128) (state 
-b---) (cpu 3) (cpu_time 1.583158713) (up_time 1401.25794005) (start_time 
1124128146.12) (console (status listening) (id 12) (domain 5) (local_port 12) 
(remote_port 1) (console_port 9605)) (devices (vif (idx 0) (vif 0) (mac 
aa:00:00:00:00:22) (vifname vif5.0) (ip 212.79.XXX.XXX/32) (evtchn 17 4) 
(index 0)) (vbd (idx 0) (vdev 2049) (device 65030) (mode w) (dev sda1) (uname 
phy:xen-volumes/vm-ralph) (node xen-volumes/vm-ralph) (index 0)) (vbd (idx 1) 
(vdev 2050) (device 65031) (mode w) (dev sda2) (uname 
phy:xen-volumes/swap-ralph) (node xen-volumes/swap-ralph) (index 1))) (config 
(vm (name vm-ralph) (memory 128) (cpu 3) (image (linux 
(kernel /boot/xen-linux-2.6.11-domu-tops1) 
(ramdisk /boot/xen-linux-2.6.11-domu-tops1-modules) (root '/dev/sda1 ro'))) 
(device (vbd (uname phy:xen-volumes/vm-ralph) (dev sda1) (mode w))) (device 
(vbd (uname phy:xen-volumes/swap-ralph) (dev sda2) (mode w))) (device (vif 
(mac aa:00:00:00:00:22) (ip 212.79.XXX.XXX/32))))))" /suspend/vm-ralph)
[DEBUG] Conn_sxpr< err=0
[1124129547.387983] xc_linux_save start 5
xc_linux_save start 5
                     
I can strace the "xm save" process, but there is not much acction:
xen:/var/log# ps fax |grep xm
 4164 pts/0    S+     0:00  |               \_ python /usr/sbin/xm save 
vm-ralph /suspend/vm-ralph
xen:/var/log# strace -p 4164
Process 4164 attached - interrupt to quit
recv(3, 
Even an xfrd thrad seems to be spawned, but there is more or less the same as 
in the xm save process:
xen:/var/log# ps fax |grep xfrd
 3808 ?        S      0:00 xfrd
 4165 ?        SL     0:00  \_ xfrd
xen:/var/log# strace -p 4165
Process 4165 attached - interrupt to quit
read(3,                                      
I can press ctrl-c and the "xm save" aborts with the following error (I waited 
over 3min):
Traceback (most recent call last):
  File "/usr/sbin/xm", line 9, in ?
    main.main(sys.argv)
  File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 808, in main
    xm.main(args)
  File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 106, in main
    self.main_call(args)
  File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 124, in 
main_call
    p.main(args[1:])
  File "/usr/lib/python2.3/site-packages/xen/xm/main.py", line 276, in main
    server.xend_domain_save(dom, savefile)
  File "/usr/lib/python2.3/site-packages/xen/xend/XendClient.py", line 244, in 
xend_domain_save
    {'op'      : 'save',
  File "/usr/lib/python2.3/site-packages/xen/xend/XendClient.py", line 148, in 
xendPost
    return self.client.xendPost(url, data)
  File "/usr/lib/python2.3/site-packages/xen/xend/XendProtocol.py", line 79, 
in xendPost
    return self.xendRequest(url, "POST", args)
  File "/usr/lib/python2.3/site-packages/xen/xend/XendProtocol.py", line 143, 
in xendRequest
    resp = conn.getresponse()
  File "/usr/lib/python2.3/httplib.py", line 781, in getresponse
    response.begin()
  File "/usr/lib/python2.3/httplib.py", line 273, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.3/httplib.py", line 231, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.3/socket.py", line 323, in readline
    data = recv(1)
KeyboardInterrupt
After that it doesn't matter if I shutdown and recreate the domain before I 
try to save the domain for the second time. It happens every time after the 
first successfull save&restore. Sometimes even on the first "xm save" 
attempt.
It even seems that xen let's the "half-saved" domain in a broken state, 
because I cannot shutdown the domain correctly after the second "xm save" 
attempt. I can ssh into it and type "halt" and it shutdowns, but xen (xm 
list) still things that the domain is running. even a xm destroy <domainname> 
doesn't help. I have to reboot the phy. machine to get the domain working 
correctly.
Because this should get a production system very soon I would appreciate help 
very much. More information (like xm dmesg) available on request... ;-PP
--Ralph
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |