0

Here is the problem I am trying to solve. I don't have a specific question in the title because I don't even know what I need.

We have an ancient Hadoop computing cluster with a very old version of Python installed. What we have done is installed a new version (2.7.9) to a local directory (that we have perms on) visible to the entire cluster, and have a virtualenv with the packages we need. Let's call this path /n/2.7.9/venv/

We are using Hadoopy to distribute Python jobs on the cluster. Hadoopy distributes the python code (the mappers and reducers) to the cluster, which are assumed to be executable and come with a shebang, but it doesn't do anything like activate a virtualenv.

If I hardcode the shebang in the .py files to /n/2.7.9/venv/, everything works. But I want to put the .py files in a library; these files should have some generic shebang like #!/usr/bin/env python. But I tried this and it does not work, because at runtime the virtualenv is not "activated" by the script and therefore it bombs with import errors.

So if anyone has any ideas on how to solve this problem I would be grateful. Essentially I want #!/usr/bin/env python to resolve to /n/2.7.9/venv/ without /n/2.7.9/venv/ being active, or some other solution where I cannot hardcode the shebang.

Currently I am solving this problem by having a run function in the library, and putting a wrapper around this function in the main code (that calls the library) with the hardcoded shebang in it. This is less offensive because the hardcoded shebang makes sense in the main code, but it is still messy because I have to have an executable wrapper file around every function I want to run from the library.

4
  • Have you tried symlinking usr/bin/env to /n/2.7.9/venv? Commented Aug 20, 2015 at 16:45
  • on the server? can you elaborate a bit? (the answer to your question is no) Commented Aug 20, 2015 at 16:46
  • as a note, i don't have access to install anything on the main cluster, meaning /usr, etc. I need a solution in local drive space (mounts like /n) Commented Aug 20, 2015 at 16:48
  • yeah i just checked and I can't do this. I could symlink /n/whatever to it but that would be essentially like hardcoding the shebang in the library. Commented Aug 20, 2015 at 16:55

2 Answers 2

2

I would change the environment variable PYTHONPATH and also the environment variable PATH. Point PYTHONPATH to your virtual environment and PATH to the directory that contains your new python executable, and make sure the path to your python executable comes first.

Sign up to request clarification or add additional context in comments.

5 Comments

Does that also work if he runs the script using ./script?
It does if he exports the environment variables. If he's using bash, then something like: export PYTHONPATH=/n/2.7.9/venv and export PATH=/n/2.7.9/venv/bin:$PATH is what I might put at the top of my script file.
at the top of which script file? i'm playing around with this now
I was answering @matan7890 's question about launching the processes that launch python via some sort of script file. In your case you have "the main code" - whatever launches "the main code" needs to have its environment modified.
You may add those line in your ./bashrc (if you are using bash).
1

I accepted John Schmitt's answer because it led me to the solution. However, I am posting what I actually did, because it might be useful for other Hadoopy users.

What I actually did was :

args['cmdenvs'] = ['export VIRTUAL_ENV=/n/2.7.9/ourvenv','export PYTHONPATH=/n/2.7.9/ourvenv', 'export PATH=/n/2.7.9/ourvenv/bin:$PATH']

and passed args into Hadoopy's launch function. In the executable .py files, I put the generic #!/usr/bin/env python shebang.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.