- Using external table
- Process doesn't have write permisions to /home/user/.Trash
calling "insert OVERWRITE" will generate the following warnning
2018-08-29 13:52:00 WARN TrashPolicyDefault:141 - Can't create trash directory: hdfs://nameservice1/user/XXXXX/.Trash/Current/data/table_1/key1=2 org.apache.hadoop.security.AccessControlException: Permission denied: user=XXXXX, access=EXECUTE, inode="/user/XXXXX/.Trash/Current/data/table_1/key1=2":hdfs:hdfs:drwx
Questions:
- Could we avoid the move to .Trash? using TBLPROPERTIES ('auto.purge'='true') on External tables doesn't work.
- "insert OVERWRITE" should rewrite the partition data , instead the new data is appended to the partition
Code Sample
creating the table
spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet location 'hdfs://nameservice1/data/table_1'")
spark.sql("insert into table_1 values('a','a1', 1)").collect()
spark.sql("insert into table_1 values ('b','b2', 2)").collect()
spark.sql("select * from table_1").collect()
overwriting partition:
spark.sql("insert OVERWRITE table table_1 values ('b','b3', 2)").collect()
result in
[Row(id=u'a', name=u'a1', key1=1),
Row(id=u'b', name=u'b2', key1=2),
Row(id=u'b', name=u'b3', key1=2)]