The Wayback Machine - https://web.archive.org/web/20201125170149/https://github.com/nteract/papermill/issues/552
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Papermill fails silently when using ProcessPoolExecutor #552

Open
zacharylawrence opened this issue Nov 14, 2020 · 4 comments
Open

Papermill fails silently when using ProcessPoolExecutor #552

zacharylawrence opened this issue Nov 14, 2020 · 4 comments

Comments

@zacharylawrence
Copy link

@zacharylawrence zacharylawrence commented Nov 14, 2020

I tried running papermill over a test notebook (e.g. one cell that just prints "Hello World") on parallel process via a ProcessPoolExecutor. The parameters are correctly populated in the output notebooks but none of the cells have been run.

If I replace ProcessPoolExecutor with ThreadPoolExecutor, each notebook runs in parallel as expected.

Any thoughts on why papermill would be breaking when using separate processes?

def _run(parameters):
    pm.execute_notebook(NOTEBOOK, OUTPUT, parameters)


with concurrent.futures.ProcessPoolExecutor() as executor:
    executor.map(_run, all_parameters)
> pip freeze | grep papermill
papermill==2.2.2
@rgbkrk
Copy link
Member

@rgbkrk rgbkrk commented Nov 15, 2020

This makes me wonder if it has to do with not awaiting the futures from the map call. 🤷

@zacharylawrence
Copy link
Author

@zacharylawrence zacharylawrence commented Nov 16, 2020

This makes me wonder if it has to do with not awaiting the futures from the map call. 🤷

I think this should be ok: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map. This claims "the iterables are collected immediately rather than lazily". This gives me the impression that each future is waited on before the map call is returned. When using a ThreadPoolExecutor, this chunk of code blocks until each notebook in the executor.map is done.

I also tried using a ProcessPoolExecutor with executor.submit() followed by executor.shutdown(wait=True, cancel_futures=False). This also failed immediately with each notebook being populated with the parameters but not run.

@rgbkrk
Copy link
Member

@rgbkrk rgbkrk commented Nov 16, 2020

Weird, thanks for the bug report. I'll have to give it a shot.

@zacharylawrence
Copy link
Author

@zacharylawrence zacharylawrence commented Nov 17, 2020

Weird, thanks for the bug report. I'll have to give it a shot.

Thanks, let me know if you need any help reproducing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.