[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] locking mechanism in hotplug scripts not working

On Wed, 2011-07-06 at 19:57 +0100, Marek Marczykowski wrote:
> Hello,
> I've found that locking isn't working as it should... It allows many
> processes to claim lock simultaneously:
> The race is:
> P1: claim_lock - success
> P2: claim_lock - read lock owner ("owner" file in lock dir) - l51
> P1: done things and release_lock
> P1: exit
> P3: claim_lock - success
> P2: notice that P1 is dead (read previously) -> release_lock l68 (!!!)
> P2: claim_lock l56 - success
> Both P2 and P3 in critical section

Urk, yes, I think you are correct about this.

> I don't have idea how to fix it in current shape.

Me neither.

> Some workaround is to remove lines 64-69... 

I don't think that would be all that bad. It seems like this is trying
to handle the case where a script exits without unlocking but we have a
trap on exit handler for that.

If the trap handler isn't working then either (a) the hotplug script has
added another trap handler during the critical section overwriting this
one IMHO this is buggy if it doesn't also release the lock or (b) there
is a bug of some sort in the shell implementation itself. We should fix
cases of (a) and ignore cases of (b) since it indicates a more
fundamental problem is at work.

> Perhaps proper fix is to use flock(1) utility, but
> this will may be less portable...

The scripts in question live under tools/hotplug/Linux, which suggests
they are at least somewhat Linux specific, where I think it is
reasonable to rely on flock(1) being available (it comes from util-linux
which is widespread in the Linux world).

Looking at flock(1) it seems that using it would involve quite a bit of
restructuring of the callers (since we'd likely be using the 3rd form
shown in the manpage).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.