3
  1. Using external table
  2. Process doesn't have write permisions to /home/user/.Trash
  3. calling "insert OVERWRITE" will generate the following warnning

    2018-08-29 13:52:00 WARN TrashPolicyDefault:141 - Can't create trash directory: hdfs://nameservice1/user/XXXXX/.Trash/Current/data/table_1/key1=2 org.apache.hadoop.security.AccessControlException: Permission denied: user=XXXXX, access=EXECUTE, inode="/user/XXXXX/.Trash/Current/data/table_1/key1=2":hdfs:hdfs:drwx

Questions:

  1. Could we avoid the move to .Trash? using TBLPROPERTIES ('auto.purge'='true') on External tables doesn't work.
  2. "insert OVERWRITE" should rewrite the partition data , instead the new data is appended to the partition

Code Sample

creating the table

spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet  location 'hdfs://nameservice1/data/table_1'")
spark.sql("insert into table_1 values('a','a1', 1)").collect()
spark.sql("insert into table_1 values ('b','b2', 2)").collect()
spark.sql("select * from  table_1").collect()

overwriting partition:

spark.sql("insert OVERWRITE table  table_1 values ('b','b3', 2)").collect()

result in

[Row(id=u'a', name=u'a1', key1=1),
 Row(id=u'b', name=u'b2', key1=2),
 Row(id=u'b', name=u'b3', key1=2)] 
2
  • what spark version and what hive jars are you using? Commented Sep 3, 2018 at 20:30
  • @prakharjain, using spark 2.3 Commented Sep 4, 2018 at 20:03

1 Answer 1

1

Add PARTITION(column) in your insert overwrite.

val spark = SparkSession.builder.appName("test").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate

    spark.sql("drop table table_1")

    spark.sql("CREATE EXTERNAL TABLE table_1 (id string, name string) PARTITIONED BY (key1 int) stored as parquet  location '/directory/your location/'")

    spark.sql("insert into table_1 values('a','a1', 1)")

    spark.sql("insert into table_1 values ('b','b2', 2)")

    spark.sql("select * from  table_1").show()

    spark.sql("insert OVERWRITE table table_1 PARTITION(key1) values ('b','b3', 2)")

    spark.sql("select * from  table_1").show()

CODE IMAGE

Sign up to request clarification or add additional context in comments.

1 Comment

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.