Under which category would you file this issue?
Providers
Apache Airflow version
3.1.8+astro.1
What happened and how to reproduce it?
The OpenLineage listener plugin uses a ProcessPoolExecutor to emit lineage events asynchronously from the scheduler. When a child process in the pool terminates abruptly, Python's concurrent.futures marks the pool as permanently broken. After this point, every subsequent OpenLineage event fails with BrokenProcessPool until the scheduler process is restarted.
This causes extended periods of missing lineage data with no self-recovery. The warning is logged but the pool is never recreated, so the problem persists indefinitely.
Scheduler logs showing the error
2026-05-21T08:01:02.690533Z [warning] OpenLineage received exception in method on_dag_run_success
[airflow.providers.openlineage.plugins.listener] loc=listener.py:918
Traceback (most recent call last):
File ".../airflow/providers/openlineage/plugins/listener.py", line 896, in on_dag_run_success
self.submit_callable(
File ".../airflow/providers/openlineage/plugins/listener.py", line 974, in submit_callable
fut = self.executor.submit(callable, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/process.py", line 805, in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
What you think should happen instead?
The OpenLineage integration could be self-healing and prevent extended outages in lineage reporting.
When a BrokenProcessPool exception is raised in submit_callable, the listener could detect the broken pool state, create a new ProcessPoolExecutor instance, and retry the submission.
Operating System
Debian GNU/Linux 12 (bookworm) — Linux 5.15.0-1110-azure (containerized on Azure)
Deployment
Astronomer
Apache Airflow Provider(s)
openlineage
Versions of Apache Airflow Providers
I think this are the relevant ones from freeze:
openlineage-integration-common==1.41.0
openlineage-python==1.45.0
openlineage_sql==1.41.0
Official Helm Chart version
Not Applicable
Kubernetes Version
Not Applicable
Helm Chart configuration
No response
Docker Image customizations
FROM astrocrpublic.azurecr.io/runtime:3.1-14
ENV AIRFLOW__CORE__MAX_MAP_LENGTH=3072
ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_CLASS_IN_EXTRA=true
ENV AIRFLOW__PROVIDERS_JDBC__ALLOW_DRIVER_PATH_IN_EXTRA=true
ENV JAVA_HOME=/usr/lib/jvm/default-java
ENV AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=50
ENV AIRFLOW__CORE__ALLOWED_DESERIALIZATION_CLASSES="[redacted]"
# .jar copies
# [redacted COPY]
# apt installs of ODBC Driver
# [readacted apt-get]
USER astro
Anything else?
Environment details:
- Python 3.12.13
- Running on Astronomer Runtime (Medium: Scheduler (1 vCPU, 2GiB RAM), DAG Processor (1 vCPU, 2GiB RAM))
Impact: Downstream consumers of OpenLineage events see extended periods of zero events. Since the only recovery is a scheduler restart (Git deploy on Astro), and the scheduler otherwise functions normally (task execution is unaffected).
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
Under which category would you file this issue?
Providers
Apache Airflow version
3.1.8+astro.1
What happened and how to reproduce it?
The OpenLineage listener plugin uses a
ProcessPoolExecutorto emit lineage events asynchronously from the scheduler. When a child process in the pool terminates abruptly, Python'sconcurrent.futuresmarks the pool as permanently broken. After this point, every subsequent OpenLineage event fails withBrokenProcessPooluntil the scheduler process is restarted.This causes extended periods of missing lineage data with no self-recovery. The warning is logged but the pool is never recreated, so the problem persists indefinitely.
Scheduler logs showing the error
What you think should happen instead?
The OpenLineage integration could be self-healing and prevent extended outages in lineage reporting.
When a
BrokenProcessPoolexception is raised insubmit_callable, the listener could detect the broken pool state, create a newProcessPoolExecutorinstance, and retry the submission.Operating System
Debian GNU/Linux 12 (bookworm) — Linux 5.15.0-1110-azure (containerized on Azure)
Deployment
Astronomer
Apache Airflow Provider(s)
openlineage
Versions of Apache Airflow Providers
I think this are the relevant ones from freeze:
Official Helm Chart version
Not Applicable
Kubernetes Version
Not Applicable
Helm Chart configuration
No response
Docker Image customizations
Anything else?
Environment details:
Impact: Downstream consumers of OpenLineage events see extended periods of zero events. Since the only recovery is a scheduler restart (Git deploy on Astro), and the scheduler otherwise functions normally (task execution is unaffected).
Are you willing to submit PR?
Code of Conduct