On Wed, Aug 05, 2009 at 04:39:23PM +0800, Zhigang Wang wrote:
> Pasi ??? wrote:
> > On Mon, Aug 11, 2008 at 10:45:23AM -0600, Jim Fehlig wrote:
> >> Ian Jackson wrote:
> >>> Jim Fehlig writes ("[Xen-devel] [PATCH] [RFC] Add lock on domain start"):
> >>>
> >>>> This patch adds a simple lock mechanism when starting domains by placing
> >>>> a lock file in xend-domains-path/<dom_uuid>. The lock file is removed
> >>>> when domain is stopped. The motivation for such a mechanism is to
> >>>> prevent starting the same domain from multiple hosts.
> >>>>
> >>> I think this should be dealt with in your next-layer-up management
> >>> tools.
> >>>
> >> Perhaps. I wanted to see if there was any interest in having such a
> >> feature at the xend layer. If not, I will no longer pursue this option.
> >>
> >
> > Replying a bit late to this.. I think there is demand for this feature!
> >
> > Many people (mostly in a smaller environments) don't want to use
> > 'next-layer-up' management tools..
> >
> >>> Lockfiles are bad because they can become stale.
> >>>
> >> Yep. Originally I considered a 'lockless-lock' approach where a bit it
> >> set and counter is spun on a 'reserved' sector of vbd, e.g. first
> >> sector. Attempting to attach the vbd to another domain would fail if
> >> lock bit is set and counter is incrementing. If counter is not
> >> incrementing assume lock is stale and proceed. This approach is
> >> certainly more complex. We support various image formats (raw, qcow,
> >> vmdk, ...) and such an approach may mean changing the format (e.g.
> >> qcow3). Wouldn't work for existing images. Who is responsible for
> >> spinning the counter? Anyhow seemed like a lot of complexity as
> >> compared to the suggested simple approach with override for stale lock.
> >>
> >
> > I assume you guys have this patch included in OpenSuse/SLES Xen rpms.
> >
> > Is the latest version available from somewhere?
> >
> > -- Pasi
> I ever seen a patch in SUSE xen rpm. maybe Jim can tell you the latest status.
>
http://serverfault.com/questions/21699/how-to-manage-xen-virtual-machines-on-shared-san-storage
In that discussion someone says xend-lock stuff can be found from SLES11 Xen.
> In Oracle VM, we add hooks in xend and use a external locking utility.
>
> currently, we use DLM (distributed lock manager) to manage the domain running
> lock to prevent the same
> VM starts from two servers simultaneously.
>
> We have add hooks to VM start/shutdown/migration for acquire/release the lock.
>
> Note during migration, we release the lock before starting the migration
> process
> and a lock will be acquired in the destination side. There still a chance for
> other servers rather than the destination server to acquire the lock. thus
> cause
> the migration fail.
>
Hmm.. I guess that also leaves a small time window for disk corruption? If
the domU was started on some other host at _exact_ correct (or bad) time
when the lock is not held anymore by the migration source host..
> hope someone can give some advice.
>
> here is the patch for your reference.
>
Thanks. Looks like possible method aswell.
-- Pasi
> thanks,
>
> zhigang
> diff -Nurp --exclude '*.orig' xen-3.4.0.bak/tools/examples/xend-config.sxp
> xen-3.4.0/tools/examples/xend-config.sxp
> --- xen-3.4.0.bak/tools/examples/xend-config.sxp 2009-08-05
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/examples/xend-config.sxp 2009-08-04 10:23:17.000000000
> +0800
> @@ -69,6 +69,12 @@
>
> (xend-unix-path /var/lib/xend/xend-socket)
>
> +# External locking utility for get/release domain running lock. By default,
> +# no utility is specified. Thus there will be no lock as VM running.
> +# The locking utility should accept:
> +# <--lock | --unlock> --name <name> --uuid <uuid>
> +# command line options, and returns zero on success, others on error.
> +#(xend-domains-lock-path '')
>
> # Address and port xend should use for the legacy TCP XMLRPC interface,
> # if xend-tcp-xmlrpc-server is set.
> diff -Nurp --exclude '*.orig'
> xen-3.4.0.bak/tools/python/xen/xend/XendDomainInfo.py
> xen-3.4.0/tools/python/xen/xend/XendDomainInfo.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendDomainInfo.py 2009-08-05
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendDomainInfo.py 2009-08-05
> 16:35:35.000000000 +0800
> @@ -359,6 +359,8 @@ class XendDomainInfo:
> @type state_updated: threading.Condition
> @ivar refresh_shutdown_lock: lock for polling shutdown state
> @type refresh_shutdown_lock: threading.Condition
> + @ivar running_lock: lock for running VM
> + @type running_lock: bool or None
> @ivar _deviceControllers: device controller cache for this domain
> @type _deviceControllers: dict 'string' to DevControllers
> """
> @@ -427,6 +429,8 @@ class XendDomainInfo:
> self.refresh_shutdown_lock = threading.Condition()
> self._stateSet(DOM_STATE_HALTED)
>
> + self.running_lock = None
> +
> self._deviceControllers = {}
>
> for state in DOM_STATES_OLD:
> @@ -453,6 +457,7 @@ class XendDomainInfo:
>
> if self._stateGet() in (XEN_API_VM_POWER_STATE_HALTED,
> XEN_API_VM_POWER_STATE_SUSPENDED, XEN_API_VM_POWER_STATE_CRASHED):
> try:
> + self.acquire_running_lock();
> XendTask.log_progress(0, 30, self._constructDomain)
> XendTask.log_progress(31, 60, self._initDomain)
>
> @@ -485,6 +490,7 @@ class XendDomainInfo:
> state = self._stateGet()
> if state in (DOM_STATE_SUSPENDED, DOM_STATE_HALTED):
> try:
> + self.acquire_running_lock();
> self._constructDomain()
>
> try:
> @@ -2617,6 +2623,11 @@ class XendDomainInfo:
>
> self._stateSet(DOM_STATE_HALTED)
> self.domid = None # Do not push into _stateSet()!
> +
> + try:
> + self.release_running_lock()
> + except:
> + log.exception("Release running lock failed: %s" % status)
> finally:
> self.refresh_shutdown_lock.release()
>
> @@ -4073,6 +4084,28 @@ class XendDomainInfo:
> params.get('burst', '50K'))
> return 1
>
> + def acquire_running_lock(self):
> + if not self.running_lock:
> + lock_path = xoptions.get_xend_domains_lock_path()
> + if lock_path:
> + status = os.system('%s --lock --name %s --uuid %s' % \
> + (lock_path, self.info['name_label'],
> self.info['uuid']))
> + if status == 0:
> + self.running_lock = True
> + else:
> + raise XendError('Acquire running lock failed: %s' %
> status)
> +
> + def release_running_lock(self):
> + if self.running_lock:
> + lock_path = xoptions.get_xend_domains_lock_path()
> + if lock_path:
> + status = os.system('%s --unlock --name %s --uuid %s' % \
> + (lock_path, self.info['name_label'],
> self.info['uuid']))
> + if status == 0:
> + self.running_lock = False
> + else:
> + raise XendError('Release running lock failed: %s' %
> status)
> +
> def __str__(self):
> return '<domain id=%s name=%s memory=%s state=%s>' % \
> (str(self.domid), self.info['name_label'],
> diff -Nurp --exclude '*.orig'
> xen-3.4.0.bak/tools/python/xen/xend/XendDomain.py
> xen-3.4.0/tools/python/xen/xend/XendDomain.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendDomain.py 2009-08-05
> 16:17:09.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendDomain.py 2009-08-04
> 10:23:17.000000000 +0800
> @@ -1317,6 +1317,7 @@ class XendDomain:
> POWER_STATE_NAMES[dominfo._stateGet()])
>
> """ The following call may raise a XendError exception """
> + dominfo.release_running_lock();
> dominfo.testMigrateDevices(True, dst)
>
> if live:
> diff -Nurp --exclude '*.orig'
> xen-3.4.0.bak/tools/python/xen/xend/XendOptions.py
> xen-3.4.0/tools/python/xen/xend/XendOptions.py
> --- xen-3.4.0.bak/tools/python/xen/xend/XendOptions.py 2009-08-05
> 16:17:42.000000000 +0800
> +++ xen-3.4.0/tools/python/xen/xend/XendOptions.py 2009-08-04
> 10:23:17.000000000 +0800
> @@ -281,6 +281,11 @@ class XendOptions:
> """
> return self.get_config_string("xend-domains-path",
> self.xend_domains_path_default)
>
> + def get_xend_domains_lock_path(self):
> + """ Get the path of the lock utility for running domains.
> + """
> + return self.get_config_string("xend-domains-lock-path")
> +
> def get_xend_state_path(self):
> """ Get the path for persistent domain configuration storage
> """
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|