New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix physics/3d/run_on_separate_thread race condition in WorkerThreadPool (crash). #67680
Fix physics/3d/run_on_separate_thread race condition in WorkerThreadPool (crash). #67680
Conversation
…Physics 2D or 3D is selected to run on a separate thread.
|
To me the name |
I modified the debug print statement to: OS::get_singleton()->print("Thread id: %u, Main id: %u, is main: %s\n",
Thread::get_caller_id(),
Thread::get_main_id(),
Thread::get_main_id() == Thread::get_caller_id() ? "yes" : "no"); // Debug lineWith ...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
...With ...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
...With ...
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 2197952072, Main id: 2197952072, is main: yes
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 2197952072, Main id: 2197952072, is main: yes
...With ...
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 1959143244, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
Thread id: 3401590889, Main id: 2197952072, is main: no
... |
|
I've tested and can confirm this does fix #67333 for me. |
Thanks for testing and confirming! |
Argument in favor of the change: every other access to the groups array was guarded by task_mutex, except this one. The PR locks task_mutex around the last and final access.
In the original change, @reduz left a comment // Threads do not access this, so safe to erase here. Why, then, was the other access to the groups HashMap in this function already guarded by the task_mutex lock. I think that inconsistency implies there was a mistake in this function, but I have some questions about the proposed fix. See my comment.
| groups.erase(p_group); // Threads do not access this, so safe to erase here. | ||
| task_mutex.lock(); // This mutex is needed when Physics 2D and/or 3D is selected to run on a separate thread. | ||
| groups.erase(p_group); | ||
| task_mutex.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think performance could be improved if that groups.erase(p_group); is moved within the matching task_mutex.lock() : this will remove possible overhead from doing two locks in a row.
if (group->low_priority_native_tasks.size() > 0) {
...
task_mutex.lock();
group_allocator.free(group);
groups.erase(p_group);
task_mutex.unlock();
} else {
...
if (finished_users == max_users) {
// All tasks using this group are gone (finished before the group), so clear the group too.
task_mutex.lock();
group_allocator.free(group);
groups.erase(p_group);
task_mutex.unlock();
} else {
task_mutex.lock(); // This mutex is needed when Physics 2D and/or 3D is selected to run on a separate thread.
groups.erase(p_group);
task_mutex.unlock();
}
}Actually... I wonder if the intent of the original code might have been that groups.erase(p_group); should only happen in the cases where group_allocator.free(group) was called, that is to say if (group->low_priority_native_tasks.size() > 0) in the original case, or if (finished_users == max_users) in the else. If this is true, I might suggest the following code instead.
if (group->low_priority_native_tasks.size() > 0) {
...
task_mutex.lock();
group_allocator.free(group);
groups.erase(p_group);
task_mutex.unlock();
} else {
...
if (finished_users == max_users) {
// All tasks using this group are gone (finished before the group), so clear the group too.
task_mutex.lock();
group_allocator.free(group);
groups.erase(p_group);
task_mutex.unlock();
}
}|
This should be fine, performance should not really be an issue in this context. |
My comment is just a performance nit, and reduz responded to it. It shouldn't block merging.
|
Thanks! |


Fixes #67333, fixes #63671, fixes #63879
May also fix this: #61250
This mutex request fixes a race condition in
WorkerThreadPoolas far as I can tell. The race condition causes an eventual segmentation fault (crash). The comment on thegroups.erase(p_group);is only true whenphysics/3d/run_on_separate_threadoption in the project settings is disabled.For example, add the following line next to
groups.erase(p_group);:If you have
physics/3d/run_on_separate_threaddisabled, it prints only a single thread id:If you enable
physics/3d/run_on_separate_thread, it prints two thread ids:The same is true if you enable the 2D thread in
physics/2d/run_on_separate_thread. If you have separate threads enabled for both 2D and 3D, there are still only two thread ids printed.I understand that this might not be the ideal place to fix the race condition, as the comment on the line suggests that there should not be multiple threads accessing it. However, this does fix the UI crashing in 4.0 beta 3 (or since #63354).
To test this:
physics/3d/run_on_separate_threadin the settings.Or you can run any existing project with the
physics/3d/run_on_separate_threadenabled and it will crash within minutes without this PR fix.I'd really appreciate if someone else also verified this as it works on my machine™ .