Skip to content

[v3-2-test] Fix macOS SIGSEGV in task execution by using fork+exec (#64874)#66872

Merged
vatsrahul1001 merged 2 commits into
v3-2-testfrom
backport-322-64874
May 15, 2026
Merged

[v3-2-test] Fix macOS SIGSEGV in task execution by using fork+exec (#64874)#66872
vatsrahul1001 merged 2 commits into
v3-2-testfrom
backport-322-64874

Conversation

@vatsrahul1001
Copy link
Copy Markdown
Contributor

@vatsrahul1001 vatsrahul1001 commented May 13, 2026

Cherry-pick of #64874

Conflict resolution

One conflict in task-sdk/tests/task_sdk/execution_time/test_supervisor.py. The PR side's after-context included two unrelated test functions — test_in_process_api_server_caches_instance and test_api_client_clears_dag_bag_override_when_dag_is_none — that come from a separate PR not in this backport set. They referenced in_process_api_server and InProcessTestSupervisor._api_client symbols/flows that v3-2-test does not have in this form, and they're not part of the #64874 diff. Dropped those two functions and kept only #64874's actual addition — the TestChildExecMain class with test_uses_fds_012_and_requests_log_channel.

On macOS, the task supervisor's bare os.fork() copies the parent's
Objective-C runtime state into the child process.  When the child
later triggers ObjC class initialization (e.g. socket.getaddrinfo ->
system DNS resolver -> Security.framework -> +[NSNumber initialize]),
the runtime detects the corrupted state and crashes with SIGABRT/SIGSEGV.

This is a well-documented macOS platform limitation -- Apple's ObjC
runtime, CoreFoundation, and libdispatch are not fork-safe.  CPython
changed multiprocessing's default start method to "spawn" on macOS in
3.8 for this reason, but Airflow's TaskSDK supervisor uses os.fork()
directly.

The fix: on macOS, immediately call os.execv() after os.fork() for
task execution subprocesses.  The exec replaces the child's address
space, giving it clean ObjC state.  The socketpair FDs survive across
exec (marked inheritable) and the child reads their numbers from an
environment variable.

Only task execution (target=_subprocess_main) uses fork+exec.  DAG
processor and triggerer pass different targets and keep bare fork --
they don't make network calls that trigger the macOS crash.

References:
- python/cpython#105912
- python/cpython#58037
- #24463

(cherry picked from commit a3383b7)
@vatsrahul1001 vatsrahul1001 added this to the Airflow 3.2.2 milestone May 13, 2026
@vatsrahul1001 vatsrahul1001 added the type:bug-fix Changelog: Bug Fixes label May 13, 2026
@vatsrahul1001 vatsrahul1001 requested a review from Lee-W May 15, 2026 06:59
@vatsrahul1001 vatsrahul1001 merged commit 72fab5c into v3-2-test May 15, 2026
89 checks passed
@vatsrahul1001 vatsrahul1001 deleted the backport-322-64874 branch May 15, 2026 11:31
vatsrahul1001 added a commit that referenced this pull request May 20, 2026
…#66872)

On macOS, the task supervisor's bare os.fork() copies the parent's
Objective-C runtime state into the child process.  When the child
later triggers ObjC class initialization (e.g. socket.getaddrinfo ->
system DNS resolver -> Security.framework -> +[NSNumber initialize]),
the runtime detects the corrupted state and crashes with SIGABRT/SIGSEGV.

This is a well-documented macOS platform limitation -- Apple's ObjC
runtime, CoreFoundation, and libdispatch are not fork-safe.  CPython
changed multiprocessing's default start method to "spawn" on macOS in
3.8 for this reason, but Airflow's TaskSDK supervisor uses os.fork()
directly.

The fix: on macOS, immediately call os.execv() after os.fork() for
task execution subprocesses.  The exec replaces the child's address
space, giving it clean ObjC state.  The socketpair FDs survive across
exec (marked inheritable) and the child reads their numbers from an
environment variable.

Only task execution (target=_subprocess_main) uses fork+exec.  DAG
processor and triggerer pass different targets and keep bare fork --
they don't make network calls that trigger the macOS crash.

References:
- python/cpython#105912
- python/cpython#58037
- #24463

(cherry picked from commit a3383b7)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
vatsrahul1001 added a commit that referenced this pull request May 20, 2026
…#66872)

On macOS, the task supervisor's bare os.fork() copies the parent's
Objective-C runtime state into the child process.  When the child
later triggers ObjC class initialization (e.g. socket.getaddrinfo ->
system DNS resolver -> Security.framework -> +[NSNumber initialize]),
the runtime detects the corrupted state and crashes with SIGABRT/SIGSEGV.

This is a well-documented macOS platform limitation -- Apple's ObjC
runtime, CoreFoundation, and libdispatch are not fork-safe.  CPython
changed multiprocessing's default start method to "spawn" on macOS in
3.8 for this reason, but Airflow's TaskSDK supervisor uses os.fork()
directly.

The fix: on macOS, immediately call os.execv() after os.fork() for
task execution subprocesses.  The exec replaces the child's address
space, giving it clean ObjC state.  The socketpair FDs survive across
exec (marked inheritable) and the child reads their numbers from an
environment variable.

Only task execution (target=_subprocess_main) uses fork+exec.  DAG
processor and triggerer pass different targets and keep bare fork --
they don't make network calls that trigger the macOS crash.

References:
- python/cpython#105912
- python/cpython#58037
- #24463

(cherry picked from commit a3383b7)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
vatsrahul1001 added a commit that referenced this pull request May 21, 2026
…#66872)

On macOS, the task supervisor's bare os.fork() copies the parent's
Objective-C runtime state into the child process.  When the child
later triggers ObjC class initialization (e.g. socket.getaddrinfo ->
system DNS resolver -> Security.framework -> +[NSNumber initialize]),
the runtime detects the corrupted state and crashes with SIGABRT/SIGSEGV.

This is a well-documented macOS platform limitation -- Apple's ObjC
runtime, CoreFoundation, and libdispatch are not fork-safe.  CPython
changed multiprocessing's default start method to "spawn" on macOS in
3.8 for this reason, but Airflow's TaskSDK supervisor uses os.fork()
directly.

The fix: on macOS, immediately call os.execv() after os.fork() for
task execution subprocesses.  The exec replaces the child's address
space, giving it clean ObjC state.  The socketpair FDs survive across
exec (marked inheritable) and the child reads their numbers from an
environment variable.

Only task execution (target=_subprocess_main) uses fork+exec.  DAG
processor and triggerer pass different targets and keep bare fork --
they don't make network calls that trigger the macOS crash.

References:
- python/cpython#105912
- python/cpython#58037
- #24463

(cherry picked from commit a3383b7)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk type:bug-fix Changelog: Bug Fixes

3 participants