Load JDBC driver for Spark DataFrame 'write' using 'jdbc' in Python Script

Question

I'm trying to load MySQL JDBC driver from a python app. I'm not invoking 'bin/pyspark' or 'spark-submit' program; instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app?

why don't you use pymysql? This is standard way to connect from python and can be easily installed using pip. pymysql.readthedocs.io/en/latest — sun_dare
– sun_dare, Commented Jun 3, 2019 at 21:37
Thanks. Reason is I'm using a design in which connections to all DBs (that can connect via JDBC) are through 'jaydebeapi' — codebee
– codebee, Commented Jun 3, 2019 at 21:38
And in this case I need to write my DataFrame to MySQL for which I need to connect via Spark. — codebee
– codebee, Commented Jun 3, 2019 at 21:46
Did you try this? providing JDBC path in the connect? conn = jdbc.connect(jdbc_class, [url, user, pw], jdbc_path) — sun_dare
– sun_dare, Commented Jun 3, 2019 at 21:48
I'm trying to use spark's DataFrameWriter that doesn't take jar file as an option. — codebee
– codebee, Commented Jun 3, 2019 at 21:51

Kafels · Accepted Answer · 2019-06-03 22:08:15Z

1

I think you want do something like this

from pyspark.sql import SparkSession

# Creates spark session with JDBC JAR
spark = SparkSession.builder \
    .appName('stack_overflow') \
    .config('spark.jars', '/path/to/mysql/jdbc/connector') \
    .getOrCreate()

# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
    (1, 'Hello'),
    (2, 'World!')
], ['Index', 'Value'])

df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
              mode='overwrite',
              properties={'user': 'db_user', 'password': 'db_pass'})

answered Jun 3, 2019 at 22:08

Kafels

4,0791 gold badge18 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

codebee Over a year ago

Thanks for the alternative. I posted my solution.

Kafels · Accepted Answer · 2019-06-03 22:13:08Z

0

Answer is to create SparkContext like this:

spark_conf = SparkConf().set("spark.jars",  "/my/path/mysql_jdbc_driver.jar")
sc = SparkContext(conf=spark_conf)

This will load mysql driver into classpath.

edited Jun 3, 2019 at 22:13

Kafels

4,0791 gold badge18 silver badges33 bronze badges

answered Jun 3, 2019 at 22:09

codebee

8441 gold badge10 silver badges25 bronze badges

Collectives™ on Stack Overflow

Load JDBC driver for Spark DataFrame 'write' using 'jdbc' in Python Script

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related