[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 13461: regressions - FAIL [and 1 more messages]

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461: regressions - 
> On Thu, 2012-07-05 at 12:21 +0100, Ian Jackson wrote:
> > Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461: 
> > regressions - FAIL"):
> > > Does stating $_lockfile before and after the flock, in addition to
> > > comparing both with $_lockfd, close the race you are worried about?
> > 
> > The correct answer to this question is to present a proof that what
> > I'm doing is correct.
> I was actually just speculating on ways to avoid $(PERL) rather than
> questioning the algorithm but as Roger and you have pointed out there is
> no such concern in this particular file.

Ah, yes.  However, if you look at my proof you'll see that it's really
best to stat the fd.  It might be possible to construct a more
complicated proof that statting the file at some points would be good
enough but I don't care to do so :-).

> The analysis is still useful though.

I will include it in the file.

> > This demonstrates, I think, that the design is correct.  I will do
> > some testing on the implementation to try to check that it implements
> > the design.
> I think so too.

It works in my tests; I also tested the "lost race" condition and it
did the right thing.

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461: regressions - 
> Can we get this checked in today (ideally early this afternoon) do you
> think? Otherwise I think we should revert the original locking change,
> to unblock testing, and re-do it along with this fix when it is ready.

I have pushed the fix.  The code is nearly identical.  I fixed the new
"let" which ought to have been a plain assignment.

I also attach for your interest the test script I used.  For the "lost
race" test I also added a "read" to locking.sh before "flock".


# HG changeset patch
# User Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
# Date 1341488425 -3600
# Node ID 497e2fe4945594cd2846f0302bef5e5dafa1c2c6
# Parent  ad08cd8e7097ec6da6526cf8ac26f0fa72cd1e01
hotplug/Linux: do not leak lockfiles

25590:bb250383a4f5 introduced a new locking scheme.  Unfortunately it
leaves behind files in /var/run/xen-hotplug.  These are spotted as
regressions by the autotester.

Fix this.  This involves changing the locking protocol to allow
lockfiles to be deleted (as removing lockfiles is unsafe with a naive
flock-based algorithm).

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Committed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

diff -r ad08cd8e7097 -r 497e2fe49455 tools/hotplug/Linux/locking.sh
--- a/tools/hotplug/Linux/locking.sh    Thu Jul 05 11:00:28 2012 +0100
+++ b/tools/hotplug/Linux/locking.sh    Thu Jul 05 12:40:25 2012 +0100
@@ -30,6 +30,7 @@ _setlockfd()
     let _lockfd=200+i
+    _lockfile="$LOCK_BASEDIR/$1"
@@ -37,13 +38,92 @@ claim_lock()
     mkdir -p "$LOCK_BASEDIR"
     _setlockfd $1
-    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
-    flock -x $_lockfd
+    # The locking strategy is identical to that from with-lock-ex(1)
+    # from chiark-utils, except using flock.  It has the benefit of
+    # it being possible to safely remove the lockfile when done.
+    # See below for a correctness proof.
+    local rightfile
+    while true; do
+        eval "exec $_lockfd>>$_lockfile"
+        flock -x $_lockfd || return $?
+        # We can't just stat /dev/stdin or /proc/self/fd/$_lockfd or
+        # use bash's test -ef because those all go through what is
+        # actually a synthetic symlink in /proc and we aren't
+        # guaranteed that our stat(2) won't lose the race with an
+        # rm(1) between reading the synthetic link and traversing the
+        # file system to find the inum.  Perl is very fast so use that.
+        rightfile=$( perl -e '
+            open STDIN, "<&'$_lockfd'" or die $!;
+            my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
+            my $file_inum = (stat $ARGV[0])[1];
+            print "y\n" if $fd_inum eq $file_inum;
+                             ' "$_lockfile" )
+        if [ x$rightfile = xy ]; then break; fi
+    done
     _setlockfd $1
-    flock -u $_lockfd
+    rm "$_lockfile"
+# Protocol and correctness proof:
+# * The lock is owned not by a process but by an open-file (informally
+#   an fd).  Any process with an fd onto this open-file is a
+#   lockholder and may perform the various operations; such a process
+#   should only do so when its co-lockholder processes expect.  Ie, we
+#   will treat all processes holding fds onto the open-file as acting
+#   in concert and not distinguish between them.
+# * You are a lockholder if
+#     - You have an fd onto an open-file which
+#       currently holds an exclusive flock lock on its inum
+#     - and that inum is currently linked at the lockfile path
+# * The rules are:
+#     - No-one but a lockholder may unlink the lockfile path
+#       (or otherwise cause it to stop referring to a file it
+#       refers to).
+#     - Anyone may open the lockfile with O_CREAT
+# * The protocol for locking is:
+#     - Open the file (O_CREAT)
+#     - flock it
+#     - fstat the fd you have open
+#     - stat the lockfile path
+#     - if both are equal you have the lock, otherwise try again.
+# * Informal proof of exclusivity:
+#     - No two open-files can hold an fcntl lock onto the same file
+#       at the same time
+#     - No two files can have the same name at the same time
+# * Informal proof of correctness of locking protocol:
+#     - After you call flock successfully no-one other than you
+#       (someone with the same open-file) can stop you having
+#       that flock lock.
+#     - Obviously the inum you get from the fstat is fixed
+#     - At the point where you call stat there are two
+#       possibilities:
+#         (i) the lockfile path referred to some other inum
+#             in which case you have failed
+#         (ii) the lockfile path referred to the same file
+#             in which case at that point you were the
+#             lockholder (by definition).
+# * Informal proof that no-one else can steal the lock:
+#     - After you call flock successfully no-one other than you
+#       can stop you having that flock lock
+#     - No-one other than the lockholder is permitted to stop
+#       the path referring to a particular inum.  So if you
+#       hold the lock then only you are allowed to stop the
+#       path referring to the file whose flock you hold; so
+#       it will continue to refer to that file.
+#   That's both the conditions for being the lockholder.
+#   Thus once you hold the lock at any instant, you will
+#   continue to do so until you voluntarily stop doing so
+#   (eg by unlinking the lockfile or closing the fd).


fatal() {
  echo >&2 "$*"
  exit 1
sigerr() {
  fatal "$0 failed; error detected."

trap sigerr ERR

. locking.sh

echo locking...
claim_lock foo
echo locked

read x

echo unlocking
release_lock foo
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.