0

Below are the last 2 lines of the PySpark ETL code:

df_writer = DataFrameWriter(usage_fact)
df_writer.partitionBy("data_date", "data_product").saveAsTable(usageWideFactTable, format=fileFormat,mode=writeMode,path=usageWideFactpath)

Where, WriteMode= append and fileFormat=orc

I wanted to use insert overwrite in place of this so that my data is not getting appended when I re-run the code. Hence I have used this:

usage_fact.createOrReplaceTempView("usage_fact")
fact = spark.sql("insert overwrite table " + usageWideFactTable + " partition (data_date, data_product) select * from  usage_fact")

But this is giving me below error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 545, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'Cannot overwrite a path that is also being read from.;'

Looks like I cannot overwrite a path from where I am reading from but don't know how to rectify it as I am new to PySpark. What exact code I should use so that this issue is removed?

1 Answer 1

1

It worked for me with same above code. I just made change in the DDL and recreated the table with below details: (Removed Properties, if used)

PARTITIONED BY (
`data_date` string,
`data_product` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'path'='s3://saasdata/datawarehouse/fact/UsageFact/')
 STORED AS INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
 's3://saasdata/datawarehouse/fact/UsageFact/'
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.