Exception: could not open socket on pyspark

Question

Whenever I am trying to execute a simple processing in pyspark, it fails to open the socket.

>>> myRDD = sc.parallelize(range(6), 3)
>>> sc.runJob(myRDD, lambda part: [x * x for x in part])

Above throws exception -

port 53554 , proto 6 , sa ('127.0.0.1', 53554)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/context.py", line 917, in runJob
    return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py", line 143, in _load_from_socket
    raise Exception("could not open socket")
Exception: could not open socket

>>> 15/08/30 19:03:05 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:613)

I checked through rdd.py _load_from_socket and realised it gets the port , but the server is not even started or sp runJob might be the issue-

port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)

Abhishek Choudhary · Accepted Answer · 2015-09-07 07:59:56Z

4

Its not the ideal solution but now I am aware of the cause. Pyspark is unable to create jvm socket with JDK 1.8 (64-bit) version, so I just set my java path to jdk 1.7 and it worked.

answered Sep 7, 2015 at 7:59

Abhishek Choudhary

8,41320 gold badges71 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

prabhugs Over a year ago

It does not work for me. I use Spark 1.5.2 and jdk1.7 version.

farmi Over a year ago

The error is when the python driver tries to connect to the scala part of the driver, on any spark action(count, reduce....) The corresponding line below shows that the timeout is hardcoded to be 3 seconds. github.com/apache/spark/blob/master/python/pyspark/rdd.py#L121 Technically there is no way right now to configure the timeout, therefore the only way for python code to recover is to catch the exception in application-level code and retry for a configurable number of times.

omarc7 · Accepted Answer · 2016-12-08 13:10:52Z

2

I was having the exact same error, tried JDK 1.7 and it didn't work, then i went and edited the /etc/hosts file and realized i had the following lines

127.0.0.1 mbp.local localhost
127.0.0.1 localhost

Just commented out the line with my computer local name and it worked.

#127.0.0.1 mbp.local localhost
127.0.0.1 localhost

Tested on PySpark 1.6.3 and 2.0.2 with JDK 1.8

answered Dec 8, 2016 at 13:10

omarc7

3323 silver badges12 bronze badges

1 Comment

Sangram Gaikwad Over a year ago

This worked..Saved my life while debugging Spark Cluster in production!!

Simon Zhang · Accepted Answer · 2020-09-27 11:07:37Z

Finally, I solved my problem.

when I started pyspark, I suddenly realized there was a warning which might have a connection with the issue.

WARN Utils:66 - Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.16.20.244 instead (on interface en0) 2020-09-27 17:26:10 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address

Then I made a change of /etc/hosts, commenting 127.0.0.1 and adding a new line to solve the loopback problem, like this,

#127.0.0.1  localhost
#255.255.255.255    broadcasthost
#:::1             localhost
172.16.20.244 localhost

It worked.

I hope it could help those have a lot of pains solving this problem with the similar warnings.

Collectives™ on Stack Overflow

Exception: could not open socket on pyspark

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Related