It's not safe
Consider the following order in which the two threads execure:
Event loop: Publisher:
----------- ----------
bool error = receive(…);
Impl* impl = client.acquire();
if (errored.load()) …;
if (error) {
closeIfError(error);
exchangeUsage(1, false);
client.impl = reconnect(…);
client.resetState();
if (errored.load()) {…}
return impl;
In the publisher thread, the first errored.load() returns false, because closeIferror() has not been called yet. Then the event loop thread calls that function, which sets errored to true. However, since the publisher thread did not increment usage_count yet, closeIfError() will delete impl and return. Then the publisher thread increments usage_count. Then the event loop thread assigns a new value to impl, and sets usage_count to 1 and errored to false. The second call to errored.load() in the publisher thread will therefore return false, and it returns impl.
There is nothing wrong with impl at this point; the publisher gets a valid pointer to it, and there is no way where it could have read impl before the call to delete and earlier. However, usage_count is now still 1. That means the next time an error happens in the event loop thread, it can delete the impl that the publisher is still using. Or if the publisher calls release(), then usage_count will become 0, and if an error happens then, closeIfError() will never complete.
Consider making impl a std::atomic<Impl*>, then in closeIfError(), first do a impl.store(nullptr), then wait for the usage_count to drop to 1. At that point you know there are no users of a previous impl anymore, and can then safely delete it and then impl.store(reconenct(…)). You can get rid of errored.
Consider implementing std::shared_mutex instead
Your struct A is both responsible for the pointer to some data, and dealing with atomics for granting access to that data. I would remove the data part entirely from it, and just focus on implementing the equivalent of std::shared_mutex. In particular:
try_lock_shared() replaces your acquire().
lock() is called by the event loop thread on error, and will hold it while it deletes the old impl and replaces it with a new one.
If you say: "but I wanted my code to be mutex-free!", then please realize that what you implemented is a mutex, and conversely a mutex is just an atomic variable to track if it's locked, and some way to wait for it to be unlocked (which in your case is the while-loop that calls sched_yield()).