EDIT TL;DR Anyone who might consider using my code below in production and can afford to require C++-20 standard should rather use std::barrier as suggested by G. Sliepen in his excellent answer.
I’m working on some OpenMP-parallelized C++ code, that is made of 3 parts with the constraint that no thread should begin part 3 before all threads have finished part 1. But it is perfectly acceptable to have some threads run part 2 while others are running part 1, or to have some threads run part 3 while others are running part 2. (See my question on StackOverflow for details.)
A synchronization barrier anywhere between part 1 and part 3 would satisfy the constraint but there’s no need for such “hard” synchronization. So I thought it would be nice to have a “split” barrier: no thread can pass the second half before all threads have passed the first half.
I managed to implement such a thing with the following code:
class split_barrier {
private:
std::mutex m;
std::condition_variable cv;
int threads_in_section;
int total_threads;
bool may_enter;
bool may_leave;
public:
split_barrier():
threads_in_section(0),
may_enter(false),
may_leave(false)
{}
void init(int threads) {
std::lock_guard<std::mutex> lock(m);
total_threads = threads;
may_enter = true;
}
void enter() {
std::unique_lock lock(m);
cv.wait(lock, [this]{return may_enter;});
if (++threads_in_section == total_threads) {
may_enter = false;
may_leave = true;
lock.unlock();
cv.notify_all();
}
}
void leave() {
std::unique_lock lock(m);
cv.wait(lock, [this]{return may_leave;});
if (--threads_in_section == 0) {
may_leave = false;
may_enter = true;
lock.unlock();
cv.notify_all();
}
}
};
Then my code looks like:
void main() {
split_barrier barrier;
#pragma omp parallel
{
#pragma omp single
barrier.init(omp_get_num_threads());
part1();
barrier.enter();
part2();
barrier.leave();
part3();
}
}
EDIT: Since it does not invalidate the only and accepted answer, I hope I am allowed to add that my “real” use-case looks more like:
void main() {
split_barrier barrier;
#pragma omp parallel
{
#pragma omp single
barrier.init(omp_get_num_threads());
while (…) {
part1();
barrier.enter();
part2();
barrier.leave();
part3();
#pragma omp barrier
part4();
}
}
}
I consider synchronization code to be very error-prone. Is my code thread-safe?