[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] A fix for the xend restart problems (2.0.x)


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Jed Davis <jdev@xxxxxxxxx>
  • Date: Fri, 19 Aug 2005 22:06:46 -0400
  • Cancel-lock: sha1:ejQUA5S7K5mA+SCJobd5lyyuMkQ=
  • Delivery-date: Sat, 20 Aug 2005 02:19:32 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

The basic problem, which from the list archives it seems that I'm not
the only one running into: the first time xend is restarted (while
there are any guests running), it immediately dies on an exception
along the lines of "Invalid backend domain" after destroying one of
the domU's.  Further attempts to restart it get a "Failed to map
domain control interface" -- unless the dom0 kernel is NetBSD with
DIAGNOSTICS, in which case it panics.

After far too much time assuming this was a NetBSD-specific problem, I
eventually tracked it down in xend, and have this patch, which
probably isn't the Right solution, but nonetheless works:

--- tools/python/xen/xend/XendDomain.py.orig    2005-08-13 01:54:56.000000000 
-0400
+++ tools/python/xen/xend/XendDomain.py 2005-08-13 01:55:17.000000000 -0400
@@ -147,7 +147,10 @@
             domid = str(d['dom'])
             doms[domid] = d
         dlist = []
-        for config in self.domain_db.values():
+        domkeys = map(int, self.domain_db.keys())
+        domkeys.sort()
+        for domkey in domkeys:
+            config = self.domain_db.get(str(domkey))
             domid = str(sxp.child_value(config, 'id'))
             if domid in doms:
                 d_dom = self._new_domain(config, doms[domid])

This change in traversal order avoids the exception shown below, when
the domU's info is being reconstructed, and its devices' backend
domain (here, dom0) is looked up -- but doesn't appear to exist yet,
because it hasn't been restored from the state files (or by querying
the hypervisor, for that matter) yet.  I assume it's due to code reuse
with a domain's actual creation that the exception causes xend to try
to destroy the domain after this fails.  The idea of the above patch,
then, is to restore the domains' state in the same order as they were
created.

This is the trace of the exception in question -- normally it gets
caught partway up and the "invalid backend domain" exception is thrown
from there, but I commented out the try/except so I could see that
first exception:

Traceback (most recent call last):
  File "/usr/local/sbin/xend", line 121, in ?
    sys.exit(main())
  File "/usr/local/sbin/xend", line 107, in main
    return daemon.start()
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDaemon.py", line 
525, in start
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDaemon.py", line 
615, in run
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvServer.py", line 
47, in create
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvRoot.py", line 
29, in __init__
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDir.py", line 69, 
in get
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDir.py", line 39, 
in getobj
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDomainDir.py", 
line 25, in __init__
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 
800, in instance
    inst = XendDomain()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 
65, in __init__
    self.initial_refresh()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 
154, in initial_refresh
    d_dom = self._new_domain(config, doms[domid])
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 
189, in _new_domain
    deferred = XendDomainInfo.vm_recreate(savedinfo, info)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 218, in vm_recreate
    d = vm.construct(config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 456, in construct
    deferred = self.configure()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 975, in configure
    d = self.create_devices()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 803, in create_devices
    v = dev_handler(self, dev, dev_index)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 1110, in vm_dev_vif
    defer = ctrl.attachDevice(vif, val, recreate=recreate)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 
423, in attachDevice
    dev = self.addDevice(vif, config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 
400, in addDevice
    dev = NetDev(vif, self, config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 
105, in __init__
    self.configure(config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 
150, in configure
    self.backendDomain = int(xd.domain_lookup(sxp.child_value(config, 
'backend', '0')).id)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 
430, in domain_lookup
    raise XendError('invalid domain:' + name)
xen.xend.XendError.XendError: invalid domain:0



-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))))))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))))))    '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.