0

Is it possible to reference a virtual env/ python that has been uploaded to a cloud storage bucket in GCP? I have a ubuntu docker image with all the proper credentials and service account set up within the image

I'm able to use gsutil commands, however, when I try to export my PYSPARK_PYTHON variable, I get the ' No such file or directory ' error

export PYSPARK_PYTHON=gs://[bucket]/deps/env/bin/python3

pyspark

env: ‘gs://[bucket]/deps/env/bin/python3’: No such file or directory

If I run:

gsutil ls gs://[bucket]/deps/env/bin/python3

I'm able to see the file

I expect pyspark to work using the python dependencies and libraries within that bucket. Is this possible at all?

2 Answers 2

2

PYSPARK_PYTHON expects an executable python executable. Files stored in object storage buckets cannot be directly executed. Make sure that the python executable is accessible as a file from within your image / some mount point.

Sign up to request clarification or add additional context in comments.

Comments

2

According to the documentation https://spark.apache.org/docs/latest/configuration.html , the environment variable PYSPARK_PYTHON expects an executable file and not a path:

Python binary executable to use for PySpark in both driver and workers (default is python2.7 if available, otherwise python). Property spark.pyspark.python take precedence if it is set.

However, the files in the bucket does not have executable scope, you can mount a disk with the image and put the files “live” to access them.

Here is a guide for mount a local disk https://cloud.google.com/compute/docs/disks/mount-ram-disks

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.