I'm thinking about implementing garbage collection efficiently by protecting the structure that tracks allocations using reader-writer lock. However, I'm worried memory semantic may invalidate the idea. In essence:
- whenever a thread operates on some objects created using my GC library, they acquire the "reader lock" associated with the GC structure;
- to make GC operation safe with regard to threads, stop-the-world stalls occur by having the GC thread acquire the "writer lock" associated with the GC structure.
However IIRC, the reason mutex work, is because it inserts memory fences into its subroutines to make changes visible to all the thread, and since "reader lock" assume the calling thread doesn't make any change, it may skip inserting a "release" memory fence. (I think relevant quotes can be attributed to David Butenhof, the author of "Programming with POSIX Threads").
Q: what experience do we have on this area?
Expanding on the "essence" (per request by @J_H)
Suppose there are some threads, each having some objects created for them by the GC (i.e. "storage offered to app threads"),
When the threads operate on their objects, they don't want GC to interrupt, so they obtain "reader" locks to do that - this way, each thread can run simultaneously.
When all the threads are done operating on the objects, and GC starts (operating on "allocator bookkeeping 'overhead' storage") for one reason or another (e.g. memory usage exceeding a heuristic threshold, explicitly requested), all threads are blocked from operating on objects - at this point, all threads had released "reader lock", and the GC thread is holding the "writer lock".
After the GC, the application runs as usual.
The part I'm concerned, is when the threads release their reader lock, no memory fence instruction is issued, and the content of the main memory may be inconsistent with cache associated with cores releasing the reader lock.