Skip to main content
added 34 characters in body
Source Link
J_H
  • 42.1k
  • 3
  • 38
  • 157

The Kafka developers have written some helpful posts on exactly onceexactly once semantics in a distributed setting. Spoiler: Servers need to persist the fencing token along with client's input data.

The Kafka developers have written some helpful posts on exactly once semantics in a distributed setting. Spoiler: Servers need to persist the fencing token along with client's input data.

The Kafka developers have written some helpful posts on exactly once semantics in a distributed setting. Spoiler: Servers need to persist the fencing token along with client's input data.

added 260 characters in body
Source Link
J_H
  • 42.1k
  • 3
  • 38
  • 157

The Kafka developers have written some helpful posts on exactly once semantics in a distributed setting. Spoiler: Servers need to persist the fencing token along with client's input data.

The Kafka developers have written some helpful posts on exactly once semantics in a distributed setting. Spoiler: Servers need to persist the fencing token along with client's input data.

Source Link
J_H
  • 42.1k
  • 3
  • 38
  • 157

odd bit masks

const LockState_LOCKED = 1 << 1
const LockState_UNLOCKED = 1 << 2

I don't understand what's going on here. Why not use a simple bool? As written, this representation seems to suggests there are four states, including "unknown" and "invalid -- both locked an unlocked". We could do CAS on 0 and 1 as easily as we could on the OP constants. The // Else state is locked comment would be entirely truthful if we had a bool, but the OP representation admits of alternative states.

nit: Counting to four billion is tedious, but doesn't necessarily take a super long time these days. Consider using 64 bits for FencingToken_MAX.

unique ID

This seems to leave "assigning a clientId" as an exercise for the reader, or the app developer, and I saw no guidance in the Review Context.

type Locker interface {
    CreateLock(name string, clientId string) ...

Consider insisting that clients identify with a GUID, so app developers have a strong hint about what should be happening here.

Later, it's unclear what VerifyStringEmpty() is doing -- perhaps rolling a new ID? The library makes up IDs, and not the calling app?

invariants

Any distributed protocol seeks to preserve certain stated invariants. You didn't state any. This frustrates attempts to reason about the protocol, and to file bug reports.

In addition to writing them down in English, it is useful to have the code execute assertions about what we guarantee shall be true.

local increment

            // Get a fencing token and set a timer in another goroutine to expire the lock
            updatedToken := atomic.AddUint32(&lock.CurrentFencingToken, 1) // safe to do, as token can only be updated by changing lock state

This is not well motivated by the Review Context. We see no "obtain ACKs from a quorum" attempt being made here, nor mention of a leader election.

Thank you for citing a reference. It described a 5-node protocol that attempted to correctly achieve distributed consensus. In the OP code a server process correctly obtains a local mutex, but are we to understand there's another 4 nodes that that clients might interact with?

Clients open a TCP connection ... to create a distributed lock

In the OP code I'm seeing the (single server) local, but not the distributed aspect.

obscure entry point

Thank you for this helpful comment.

                <-timer.C  // wait for timer to expire

But why not use the more self-explanatory .After()?

nit: typo, "there" -- // For clients where their is ...

design flaw: VerifyLease

For reasons eloquently described in the cited reference, clients cannot know about "some sort of pause", and so cannot effectively operate within lease time by issuing verification checks. It's unclear how this function could be of use to application clients.

Perhaps the comments should have described how a server persistence layer might use such a check before serializing input from a client? The clients should verify the lease remark seems out of step with that.

automated test suite

There isn't one.

The cited reference suggested that you might wish to write Jepsen clients which exercise this codebase. And there are other tools, such as the popular Spin model checker.


Sorry, I didn't find a convincing proof of correctness here, no QED. It's admittedly hard to achieve that. Part of the trouble may stem from the concept-of-operations, the setup and operational environment. Better documentation could help with that. Automated tests can be a valuable adjunct to English documentation.