Spark-sql Insert OVERWRITE append data instead of overwriting

Question

Using external table
Process doesn't have write permisions to /home/user/.Trash
calling "insert OVERWRITE" will generate the following warnning

2018-08-29 13:52:00 WARN TrashPolicyDefault:141 - Can't create trash directory: hdfs://nameservice1/user/XXXXX/.Trash/Current/data/table_1/key1=2 org.apache.hadoop.security.AccessControlException: Permission denied: user=XXXXX, access=EXECUTE, inode="/user/XXXXX/.Trash/Current/data/table_1/key1=2":hdfs:hdfs:drwx

Questions:

Could we avoid the move to .Trash? using TBLPROPERTIES ('auto.purge'='true') on External tables doesn't work.
"insert OVERWRITE" should rewrite the partition data , instead the new data is appended to the partition

Code Sample

creating the table

spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet  location 'hdfs://nameservice1/data/table_1'")
spark.sql("insert into table_1 values('a','a1', 1)").collect()
spark.sql("insert into table_1 values ('b','b2', 2)").collect()
spark.sql("select * from  table_1").collect()

overwriting partition:

spark.sql("insert OVERWRITE table  table_1 values ('b','b3', 2)").collect()

result in

[Row(id=u'a', name=u'a1', key1=1),
 Row(id=u'b', name=u'b2', key1=2),
 Row(id=u'b', name=u'b3', key1=2)]

what spark version and what hive jars are you using?

moriarty007
– moriarty007

2018-09-03 20:30:03 +00:00
Commented Sep 3, 2018 at 20:30 — moriarty007
– moriarty007, Commented Sep 3, 2018 at 20:30
@prakharjain, using spark 2.3

sami
– sami

2018-09-04 20:03:25 +00:00
Commented Sep 4, 2018 at 20:03 — sami
– sami, Commented Sep 4, 2018 at 20:03

Prasad Sogalad · Accepted Answer · 2019-09-28 19:45:12Z

1

Add PARTITION(column) in your insert overwrite.

val spark = SparkSession.builder.appName("test").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate

    spark.sql("drop table table_1")

    spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet  location '/directory/your location/'")

    spark.sql("insert into table_1 values('a','a1', 1)")

    spark.sql("insert into table_1 values ('b','b2', 2)")

    spark.sql("select * from  table_1").show()

    spark.sql("insert OVERWRITE table table_1 PARTITION(key1) values ('b','b3', 2)")

    spark.sql("select * from  table_1").show()

CODE IMAGE

edited Sep 28, 2019 at 19:45

answered Sep 27, 2019 at 21:34

Prasad Sogalad

1311 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alessio Over a year ago

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

Collectives™ on Stack Overflow

Spark-sql Insert OVERWRITE append data instead of overwriting

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related