I'm trying to load MySQL JDBC driver from a python app. I'm not invoking 'bin/pyspark' or 'spark-submit' program; instead I have a Python script in which I'm initializing 'SparkContext' and 'SparkSession' objects. I understand that we can pass '--jars' option when invoking 'pyspark', but how do I load and specify jdbc driver in my python app?
-
why don't you use pymysql? This is standard way to connect from python and can be easily installed using pip. pymysql.readthedocs.io/en/latestsun_dare– sun_dare2019-06-03 21:37:42 +00:00Commented Jun 3, 2019 at 21:37
-
Thanks. Reason is I'm using a design in which connections to all DBs (that can connect via JDBC) are through 'jaydebeapi'codebee– codebee2019-06-03 21:38:58 +00:00Commented Jun 3, 2019 at 21:38
-
And in this case I need to write my DataFrame to MySQL for which I need to connect via Spark.codebee– codebee2019-06-03 21:46:09 +00:00Commented Jun 3, 2019 at 21:46
-
Did you try this? providing JDBC path in the connect? conn = jdbc.connect(jdbc_class, [url, user, pw], jdbc_path)sun_dare– sun_dare2019-06-03 21:48:14 +00:00Commented Jun 3, 2019 at 21:48
-
I'm trying to use spark's DataFrameWriter that doesn't take jar file as an option.codebee– codebee2019-06-03 21:51:30 +00:00Commented Jun 3, 2019 at 21:51
Add a comment
|
2 Answers
I think you want do something like this
from pyspark.sql import SparkSession
# Creates spark session with JDBC JAR
spark = SparkSession.builder \
.appName('stack_overflow') \
.config('spark.jars', '/path/to/mysql/jdbc/connector') \
.getOrCreate()
# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
(1, 'Hello'),
(2, 'World!')
], ['Index', 'Value'])
df.write.jdbc('jdbc:mysql://host:3306/my_db', 'my_table',
mode='overwrite',
properties={'user': 'db_user', 'password': 'db_pass'})
1 Comment
codebee
Thanks for the alternative. I posted my solution.