I am trying to connect teradata server through PySpark.
My CLI code is as below,
from pyspark.sql import SparkSession
spark=SparkSession.builder
.appName("Teradata connect")
.getOrCreate()
df = sqlContext.read
.format("jdbc")
.options(url="jdbc:teradata://xy/",
driver="com.teradata.jdbc.TeraDriver",
dbtable="dbname.tablename",
user="user1",password="***")
.load()
Which is giving error,
py4j.protocol.Py4JJavaError: An error occurred while calling o159.load. : java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
To resolve this I think, I need to add jar terajdbc4.jar and `tdgssconfig.jar.
In Scala, to add jar we can use
sc.addJar("<path>/jar-name.jar")
If I use the same for PySpark, I am having error,
AttributeError: 'SparkContext' object has no attribute 'addJar'.
or
AttributeError: 'SparkSession' object has no attribute 'addJar'
How can I add jar terajdbc4.jar and tdgssconfig.jar?
pyspark2 --jars /data/1/gcgeeapmxtldu/lib/tdgssconfig.jar,/data/1/gcgeeapmxtldu/lib/terajdbc4.jarspark = SparkSession.builder.appName("sparkanalysis")\ .config("spark.driver.extraClassPath","/local_path/terajdbc4.jar,/local_path/tdgssconfig.jar")\ .config("spark.executor.extraClassPath","/local_path/terajdbc4.jar,/local_path/tdgssconfig.jar")\ .config("spark.jars","/local_path/terajdbc4.jar,/local_path/tdgssconfig.jar")\ .config("spark.repl.local.jars","/local_path/tdgssconfig.jar,/local_path/terajdbc4.jar")\ .getOrCreate()df = spark.read.format("jdbc")\ .option("url","jdbc:teradata://xyz")\ .option("driver","com.teradata.jdbc.TeraDriver")\ .option("dbtable","table").option("user","USR1").option("password","*****")\ .load()