1

I am installing Spark on a set of VMs. I should also note that I followed the same installation process that I've used multiple times in the past on physical servers and VMs and have never seen this issue. I'm puzzled as to why I am seeing this now.

However, it seems pyspark is having some problem initializing the SparkContext.

>pyspark
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/22 13:24:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:24:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "/home/jon/spark/python/pyspark/shell.py", line 43, in <module>
    spark = SparkSession.builder\
  File "/home/jon/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/jon/spark/python/pyspark/context.py", line 310, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/jon/spark/python/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/home/jon/spark/python/pyspark/context.py", line 188, in _do_init
    self._accumulatorServer = accumulators._start_update_server()
  File "/home/jon/spark/python/pyspark/accumulators.py", line 259, in _start_update_server
    server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
  File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 417, in __init__
    self.server_bind()
  File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 431, in server_bind
    self.socket.bind(self.server_address)
  File "/apps/usr/local64/anaconda/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
>>> quit()

Interestingly enough, spark-shell does not show this problem. My intuition is that there is a problem with Python connecting to the server the JVM starts up. Does anyone have any suggestions on how to resolve/debug this?

>spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/22 13:13:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:13:59 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.25.5.46:4040
Spark context available as 'sc' (master = local[*], app id = local-1503425633272).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

When I try to launch a simple program:

I see the following errors, similar to above

spark-submit test-pyspark.py
17/08/22 13:47:37 INFO SparkContext: Running Spark version 2.1.1
17/08/22 13:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:47:37 INFO SecurityManager: Changing view acls to: jon
17/08/22 13:47:37 INFO SecurityManager: Changing modify acls to: jon
17/08/22 13:47:37 INFO SecurityManager: Changing view acls groups to:
17/08/22 13:47:37 INFO SecurityManager: Changing modify acls groups to:
17/08/22 13:47:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jon); groups with view permissions: Set(); users  with modify permissions: Set(jon); groups with modify permissions: Set()
17/08/22 13:47:38 INFO Utils: Successfully started service 'sparkDriver' on port 51440.
17/08/22 13:47:38 INFO SparkEnv: Registering MapOutputTracker
17/08/22 13:47:38 INFO SparkEnv: Registering BlockManagerMaster
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/22 13:47:38 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-c3ad2263-4416-45f2-927b-8517e4f3213f
17/08/22 13:47:38 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/22 13:47:38 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/22 13:47:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/22 13:47:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.25.5.46:4040
17/08/22 13:47:38 INFO SparkContext: Added file file:/home/jon/test-pyspark.py at file:/home/jon/test-pyspark.py with timestamp 1503427658741
17/08/22 13:47:38 INFO Utils: Copying /home/jon/test-pyspark.py to /tmp/spark-71ba944d-e11b-4cd5-bfcc-386f85b28a9a/userFiles-095d828d-24ec-43a2-ac58-4d9eb07177aa/test-pyspark.py
17/08/22 13:47:38 INFO Executor: Starting executor ID driver on host localhost
17/08/22 13:47:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56262.
17/08/22 13:47:38 INFO NettyBlockTransferService: Server created on 172.25.5.46:56262
17/08/22 13:47:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/22 13:47:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: Registering block manager 172.25.5.46:56262 with 366.3 MB RAM, BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:39 INFO SparkUI: Stopped Spark web UI at http://172.25.5.46:4040
17/08/22 13:47:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/22 13:47:39 INFO MemoryStore: MemoryStore cleared
17/08/22 13:47:39 INFO BlockManager: BlockManager stopped
17/08/22 13:47:39 INFO BlockManagerMaster: BlockManagerMaster stopped
**17/08/22 13:47:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!**
**17/08/22 13:47:39 INFO SparkContext: Successfully stopped SparkContext**
Traceback (most recent call last):
  File "/home/jon/test-pyspark.py", line 5, in <module>
    sc = SparkContext(conf=conf)
  File "/home/jon/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
  File "/home/jon/spark/python/lib/pyspark.zip/pyspark/context.py", line 188, in _do_init
  File "/home/jon/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 259, in _start_update_server
  File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 417, in __init__
    self.server_bind()
  File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 431, in server_bind
    self.socket.bind(self.server_address)
  File "/apps/usr/local64/anaconda/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
17/08/22 13:47:39 INFO ShutdownHookManager: Shutdown hook called
17/08/22 13:47:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-71ba944d-e11b-4cd5-bfcc-386f85b28a9a
0

1 Answer 1

1

It looks like PySpark fails to start TCP server used for accumulator updates. AccumulatorServer is started at localhsost:

server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)

and the error:

socket.gaierror: [Errno -2] Name or service not known

suggests some isssue with the address resolution. Please double check your network configuration.

Based on:

Looks like a network configuration issue. Could you include /etc/hosts?

Looks like the solution was to fix the permissions to /etc/hosts so that VMs have read access.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.