odd bit masks
const LockState_LOCKED = 1 << 1
const LockState_UNLOCKED = 1 << 2
I don't understand what's going on here.
Why not use a simple bool?
As written, this representation seems to
suggests there are four states, including "unknown"
and "invalid -- both locked an unlocked".
We could do CAS on 0 and 1 as easily
as we could on the OP constants.
The // Else state is locked comment
would be entirely truthful if we had a bool,
but the OP representation admits of alternative states.
nit: Counting to four billion is tedious, but doesn't
necessarily take a super long time these days.
Consider using 64 bits for FencingToken_MAX.
unique ID
This seems to leave "assigning a clientId"
as an exercise for the reader, or the app developer,
and I saw no guidance in the Review Context.
type Locker interface {
CreateLock(name string, clientId string) ...
Consider insisting that clients identify with a GUID,
so app developers have a strong hint about what should
be happening here.
Later, it's unclear what VerifyStringEmpty() is doing --
perhaps rolling a new ID?
The library makes up IDs, and not the calling app?
invariants
Any distributed protocol seeks to preserve certain stated invariants.
You didn't state any.
This frustrates attempts to reason about the protocol,
and to file bug reports.
In addition to writing them down in English,
it is useful to have the code execute assertions about
what we guarantee shall be true.
local increment
// Get a fencing token and set a timer in another goroutine to expire the lock
updatedToken := atomic.AddUint32(&lock.CurrentFencingToken, 1) // safe to do, as token can only be updated by changing lock state
This is not well motivated by the Review Context.
We see no "obtain ACKs from a quorum" attempt being made here,
nor mention of a leader election.
Thank you for citing a reference.
It described a 5-node protocol that attempted to
correctly achieve distributed consensus.
In the OP code a server process correctly obtains
a local mutex, but are we to understand there's
another 4 nodes that that clients might interact with?
Clients open a TCP connection ... to create a distributed lock
In the OP code I'm seeing the (single server) local,
but not the distributed aspect.
obscure entry point
Thank you for this helpful comment.
<-timer.C // wait for timer to expire
But why not use the more self-explanatory
.After()?
nit: typo, "there" -- // For clients where their is ...
design flaw: VerifyLease
For reasons eloquently described in the cited reference,
clients cannot know about "some sort of pause",
and so cannot effectively operate within lease time by
issuing verification checks.
It's unclear how this function could be of use to application clients.
Perhaps the comments should have described how
a server persistence layer might use such a check
before serializing input from a client?
The clients should verify the lease remark
seems out of step with that.
automated test suite
There isn't one.
The cited reference suggested that you might wish to write
Jepsen
clients which exercise this codebase.
And there are other tools, such as the popular
Spin
model checker.
Sorry, I didn't find a convincing proof of correctness here, no QED.
It's admittedly hard to achieve that.
Part of the trouble may stem from the concept-of-operations,
the setup and operational environment.
Better documentation could help with that.
Automated tests can be a valuable adjunct to English documentation.