[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] [RFC] Add lock on domain start



On Wed, Aug 05, 2009 at 04:39:23PM +0800, Zhigang Wang wrote:
> Pasi ??? wrote:
> > On Mon, Aug 11, 2008 at 10:45:23AM -0600, Jim Fehlig wrote:
> >> Ian Jackson wrote:
> >>> Jim Fehlig writes ("[Xen-devel] [PATCH] [RFC] Add lock on domain start"):
> >>>   
> >>>> This patch adds a simple lock mechanism when starting domains by placing 
> >>>> a lock file in xend-domains-path/<dom_uuid>.  The lock file is removed 
> >>>> when domain is stopped.  The motivation for such a mechanism is to 
> >>>> prevent starting the same domain from multiple hosts.
> >>>>     
> >>> I think this should be dealt with in your next-layer-up management
> >>> tools.
> >>>   
> >> Perhaps.  I wanted to see if there was any interest in having such a
> >> feature at the xend layer.  If not, I will no longer pursue this option.
> >>
> > 
> > Replying a bit late to this.. I think there is demand for this feature! 
> > 
> > Many people (mostly in a smaller environments) don't want to use
> > 'next-layer-up' management tools..
> > 
> >>> Lockfiles are bad because they can become stale.
> >>>   
> >> Yep.  Originally I considered a 'lockless-lock' approach where a bit it
> >> set and counter is spun on a 'reserved' sector of vbd, e.g. first
> >> sector.  Attempting to attach the vbd to another domain would fail if
> >> lock bit is set and counter is incrementing.  If counter is not
> >> incrementing assume lock is stale and proceed.  This approach is
> >> certainly more complex.  We support various image formats (raw, qcow,
> >> vmdk, ...) and such an approach may mean changing the format (e.g.
> >> qcow3).  Wouldn't work for existing images.  Who is responsible for
> >> spinning the counter?  Anyhow seemed like a lot of complexity as
> >> compared to the suggested simple approach with override for stale lock.
> >>
> > 
> > I assume you guys have this patch included in OpenSuse/SLES Xen rpms.
> > 
> > Is the latest version available from somewhere? 
> > 
> > -- Pasi
> I ever seen a patch in SUSE xen rpm. maybe Jim can tell you the latest status.
> 

http://serverfault.com/questions/21699/how-to-manage-xen-virtual-machines-on-shared-san-storage

In that discussion someone says xend-lock stuff can be found from SLES11 Xen.

> In Oracle VM, we add hooks in xend and use a external locking utility.
> 
> currently, we use DLM (distributed lock manager) to manage the domain running 
> lock to prevent the same
> VM starts from two servers simultaneously.
> 
> We have add hooks to VM start/shutdown/migration for acquire/release the lock.
> 
> Note during migration, we release the lock before starting the migration 
> process
> and a lock will be acquired in the destination side. There still a chance for
> other servers rather than the destination server to acquire the lock. thus 
> cause
> the migration fail.
> 

Hmm.. I guess that also leaves a small time window for disk corruption? If
the domU was started on some other host at _exact_ correct (or bad) time
when the lock is not held anymore by the migration source host..

> hope someone can give some advice.
> 
> here is the patch for your reference.
> 

Thanks. Looks like possible method aswell.

-- Pasi

> thanks,
> 
> zhigang

> diff -Nurp --exclude '*.orig' xen-3.4.0.bak/tools/examples/xend-config.sxp 
> xen-3.4.0/tools/examples/xend-config.sxp
> --- xen-3.4.0.bak/tools/examples/xend-config.sxp      2009-08-05 
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/examples/xend-config.sxp  2009-08-04 10:23:17.000000000 
> +0800
> @@ -69,6 +69,12 @@
>  
>  (xend-unix-path /var/lib/xend/xend-socket)
>  
> +# External locking utility for get/release domain running lock. By default,
> +# no utility is specified. Thus there will be no lock as VM running.
> +# The locking utility should accept:
> +# <--lock | --unlock> --name <name> --uuid <uuid>
> +# command line options, and returns zero on success, others on error.
> +#(xend-domains-lock-path '')
>  
>  # Address and port xend should use for the legacy TCP XMLRPC interface, 
>  # if xend-tcp-xmlrpc-server is set.
> diff -Nurp --exclude '*.orig' 
> xen-3.4.0.bak/tools/python/xen/xend/XendDomainInfo.py 
> xen-3.4.0/tools/python/xen/xend/XendDomainInfo.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendDomainInfo.py     2009-08-05 
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendDomainInfo.py 2009-08-05 
> 16:35:35.000000000 +0800
> @@ -359,6 +359,8 @@ class XendDomainInfo:
>      @type state_updated: threading.Condition
>      @ivar refresh_shutdown_lock: lock for polling shutdown state
>      @type refresh_shutdown_lock: threading.Condition
> +    @ivar running_lock: lock for running VM
> +    @type running_lock: bool or None
>      @ivar _deviceControllers: device controller cache for this domain
>      @type _deviceControllers: dict 'string' to DevControllers
>      """
> @@ -427,6 +429,8 @@ class XendDomainInfo:
>          self.refresh_shutdown_lock = threading.Condition()
>          self._stateSet(DOM_STATE_HALTED)
>  
> +        self.running_lock = None
> +
>          self._deviceControllers = {}
>  
>          for state in DOM_STATES_OLD:
> @@ -453,6 +457,7 @@ class XendDomainInfo:
>  
>          if self._stateGet() in (XEN_API_VM_POWER_STATE_HALTED, 
> XEN_API_VM_POWER_STATE_SUSPENDED, XEN_API_VM_POWER_STATE_CRASHED):
>              try:
> +                self.acquire_running_lock();
>                  XendTask.log_progress(0, 30, self._constructDomain)
>                  XendTask.log_progress(31, 60, self._initDomain)
>                  
> @@ -485,6 +490,7 @@ class XendDomainInfo:
>          state = self._stateGet()
>          if state in (DOM_STATE_SUSPENDED, DOM_STATE_HALTED):
>              try:
> +                self.acquire_running_lock();
>                  self._constructDomain()
>  
>                  try:
> @@ -2617,6 +2623,11 @@ class XendDomainInfo:
>  
>              self._stateSet(DOM_STATE_HALTED)
>              self.domid = None  # Do not push into _stateSet()!
> +      
> +            try:
> +                self.release_running_lock()
> +            except:
> +                log.exception("Release running lock failed: %s" % status)
>          finally:
>              self.refresh_shutdown_lock.release()
>  
> @@ -4073,6 +4084,28 @@ class XendDomainInfo:
>                                     params.get('burst', '50K'))
>          return 1
>  
> +    def acquire_running_lock(self):
> +        if not self.running_lock:
> +            lock_path = xoptions.get_xend_domains_lock_path()
> +            if lock_path:
> +                status = os.system('%s --lock --name %s --uuid %s' % \
> +                                   (lock_path, self.info['name_label'], 
> self.info['uuid']))
> +                if status == 0:
> +                    self.running_lock = True
> +                else:
> +                    raise XendError('Acquire running lock failed: %s' % 
> status)
> +
> +    def release_running_lock(self):
> +        if self.running_lock:
> +            lock_path = xoptions.get_xend_domains_lock_path()
> +            if lock_path:
> +                status = os.system('%s --unlock --name %s --uuid %s' % \
> +                                   (lock_path, self.info['name_label'], 
> self.info['uuid']))
> +                if status == 0:
> +                    self.running_lock = False
> +                else:
> +                    raise XendError('Release running lock failed: %s' % 
> status)
> +
>      def __str__(self):
>          return '<domain id=%s name=%s memory=%s state=%s>' % \
>                 (str(self.domid), self.info['name_label'],
> diff -Nurp --exclude '*.orig' 
> xen-3.4.0.bak/tools/python/xen/xend/XendDomain.py 
> xen-3.4.0/tools/python/xen/xend/XendDomain.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendDomain.py 2009-08-05 
> 16:17:09.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendDomain.py     2009-08-04 
> 10:23:17.000000000 +0800
> @@ -1317,6 +1317,7 @@ class XendDomain:
>                               POWER_STATE_NAMES[dominfo._stateGet()])
>  
>          """ The following call may raise a XendError exception """
> +        dominfo.release_running_lock();
>          dominfo.testMigrateDevices(True, dst)
>  
>          if live:
> diff -Nurp --exclude '*.orig' 
> xen-3.4.0.bak/tools/python/xen/xend/XendOptions.py 
> xen-3.4.0/tools/python/xen/xend/XendOptions.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendOptions.py        2009-08-05 
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendOptions.py    2009-08-04 
> 10:23:17.000000000 +0800
> @@ -281,6 +281,11 @@ class XendOptions:
>          """
>          return self.get_config_string("xend-domains-path", 
> self.xend_domains_path_default)
>  
> +    def get_xend_domains_lock_path(self):
> +        """ Get the path of the lock utility for running domains.
> +        """
> +        return self.get_config_string("xend-domains-lock-path")
> +
>      def get_xend_state_path(self):
>          """ Get the path for persistent domain configuration storage
>          """


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.