1

I'm thinking about implementing garbage collection efficiently by protecting the structure that tracks allocations using reader-writer lock. However, I'm worried memory semantic may invalidate the idea. In essence:

  • whenever a thread operates on some objects created using my GC library, they acquire the "reader lock" associated with the GC structure;
  • to make GC operation safe with regard to threads, stop-the-world stalls occur by having the GC thread acquire the "writer lock" associated with the GC structure.

However IIRC, the reason mutex work, is because it inserts memory fences into its subroutines to make changes visible to all the thread, and since "reader lock" assume the calling thread doesn't make any change, it may skip inserting a "release" memory fence. (I think relevant quotes can be attributed to David Butenhof, the author of "Programming with POSIX Threads").

Q: what experience do we have on this area?


Expanding on the "essence" (per request by @J_H)

Suppose there are some threads, each having some objects created for them by the GC (i.e. "storage offered to app threads"),

When the threads operate on their objects, they don't want GC to interrupt, so they obtain "reader" locks to do that - this way, each thread can run simultaneously.

When all the threads are done operating on the objects, and GC starts (operating on "allocator bookkeeping 'overhead' storage") for one reason or another (e.g. memory usage exceeding a heuristic threshold, explicitly requested), all threads are blocked from operating on objects - at this point, all threads had released "reader lock", and the GC thread is holding the "writer lock".

After the GC, the application runs as usual.

The part I'm concerned, is when the threads release their reader lock, no memory fence instruction is issued, and the content of the main memory may be inconsistent with cache associated with cores releasing the reader lock.

8
  • 1
    A problem with fine-grained locks is that it is extremely easy to produce deadlocks. For this to work efficiently and safely, there would have to be extra rules like "every thread may lock at most 1 GC object at a time" and "GC object locks must only be held for a short duration". Only the GC system itself would be exempt. But now these fine-grained locks are more like a single coarse-grained lock, and we have arrived at something like CPython's GIL. Commented Feb 17, 2024 at 15:54
  • I should mention since not everyone knows. Reader-writer lock is an actual thing! Such as the one defined in POSIX, as well as the one implemented in Mr Butenhof's book for illustration. Commented Feb 17, 2024 at 23:21
  • IIRC the writer lock cannot be acquired unless all reader locks have been released. That's kind of the point of the reader-writer lock. So i don't think the problem you're describing should occur with a properly implemented lock. However, this will also prevent you from doing the stop the world event, because the writer lock only locks when nobody is using your objects. Executing arbitrary code within locks is generally not a good idea. Commented Feb 18, 2024 at 0:16
  • @Ccm "writer lock cannot be acquired unless all reader locks have been released" - Yes, but will the thread acquiring the writer lock see all the changes made by threads that held reader lock? Especially on NUMA architectures? I'm asking this from a memory-order sematic point of view. Commented Feb 18, 2024 at 0:48
  • 2
    I think your concern about the insufficient fence is valid, but this illustrates that the scenario is inherently unsound. (1) You want a conventional rwlock but want the "read" lock to order writes. That doesn't make sense. But you could use the same techniques as an rwlock to implement a threads-xor-gc lock that does guarantee suitable memory order. (2) You imply that objects can be shared between threads. If they modify the object without explicit synchronization, that's already unsound – regardless of GC strategy. Commented Feb 18, 2024 at 6:49

1 Answer 1

1

Given the expected use of RW-locks, and that currently no standard specifying the memory sementics of reader-writer lock, I expect that someone will eventually file a bug at AustinGroupBugs.net with the resolution: releasing a reader lock after changing relevant memory locations may result in undefined behavior.

Currently, draft-4.1 for Single Unix Specification 5 (Issue 8) has listed reader-writer locks as operations that synchronizes memory. Since the holders of reader locks are likely going to modify other memory regions that they operate on respectively exclusively, some fences will have to be added to ensure these per-thread memory regions are visible globally after the release of reader locks.

So I thought of 2 solutions, both valid, yet each suited for particular purposes the other doesn't.

  1. Use RW-Lock provided by the host, and include explicit memory fencing call when releasing he reader lock. Example of releasing reader lock:
#include <stdatomic.h>
#include <pthread.h>
...
    atomic_thread_fence(memory_order_release);
    pthread_rwlock_unlock(&gc->thr_xor_gc);
...

Advantages:

  • performance optimized by OS,
  • realtime-aware,
  1. Use regular mutices and condition variables. Example of releasing reader lock:
#include <pthread.h>
...
    pthread_lock(&gc->lock_main); // the master lock of the entire GC structure.
    if( gc->inprogress_gc || !gc->inprogress_threads ) // the boolean indicating whether GC thread and application threads respectively are active
    {
        pthread_unlock(&gc->lock_main);
        return; // may return a value if appropriate.
    }
    if( !--gc->inprogress_threads )
        pthread_cond_signal(&gc->cv_writer);
    pthread_unlock(&gc->lock_main);
...

Advantages:

  • If I want application threads to stall when there's any thread that's blocked attempting to do GC, I can customimze this behavior (where as regular RW-Lock cannot do that explicitly, unless optional POSIX APIs that alters scheduling are available).

  • All fences are implicit.


The latest GCC and Clang compilers support the atomic_thread_fence function, and Single Unix Specification mandate implementations to support all pthread interfaces. For earlier versions of the compilers and OSes, compiler built-ins and vendor-specific APIs may be available. For toolchains and platforms LACKING ALL of these, probably they'll not be supporting multi-threading in the first place.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.