2

I have a Hive Orc table with a definition similar to the following definition

CREATE EXTERNAL TABLE `example.example_table`(
  ...
  )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
WITH SERDEPROPERTIES ( 
  'path'='s3a://path/to/table') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3a://path/to/table'
TBLPROPERTIES (
  ...
)

I am attempting to use PySpark to append a dataframe to this table using "df.write.insertInto("example.example_table")". When running this, I get the following error:

org.apache.spark.sql.AnalysisException: Can only write data to relations with a single path.;
    at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:188)
    at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134)
    ...

When looking at the underlying Scala code, the condition that throws this error is checking to see if the table location has multiple "rootPaths". Obviously, my table is defined with a single location. What else could cause this?

2 Answers 2

2

It is that path that you are defining that causes the error. I just ran into this same problem myself. Hive generates a location path based on the hive.metastore.warehouse.dir property, so you have that default location plus the path you specified, which is causing that linked code to fail.

If you want to pick a specific path other than the default, then try using LOCATION.

Try running a describe extended example.example_table query to see more detailed information on the table. One of the output rows will be a Detailed Table Information which contains a bunch of useful information:

Table(
  tableName:
  dbName:
  owner:
  createTime:1548335003
  lastAccessTime:0
  retention:0
  sd:StorageDescriptor(cols:
    location:[*path_to_table*]
    inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
    outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
    compressed:false
    numBuckets:-1
    serdeInfo:SerDeInfo(
      name:null
      serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      parameters:{
        serialization.format=1
        path=[*path_to_table*]
      }
    )
    bucketCols:[]
    sortCols:[]
    parameters:{}
    skewedInfo:SkewedInfo(skewedColNames:[]
    skewedColValues:[]
    skewedColValueLocationMaps:{})
    storedAsSubDirectories:false
  )
  partitionKeys:[]
  parameters:{transient_lastDdlTime=1548335003}
  viewOriginalText:null
  viewExpandedText:null
  tableType:MANAGED_TABLE
  rewriteEnabled:false
)
Sign up to request clarification or add additional context in comments.

5 Comments

In my case, the "location" and "path" values of the "describe extended" query match and are what I was expecting. I am using an external table as well.
Mine also matched, but I think that is the problem, that Spark is seeing both of them. Since the location is automatically generated, you don't need to specify the path manually.
Hi @conrosebraugh Any Luck on this one, Did you overcome this Problem?
@Pavan_Obj unfortunately I did not find a work around. I think I ended up using Hive for the class of tables that had this issue so that I could move on to other projects. I should've opened a bug with the Apache Spark team.
I am glad that you responded, Conroe :)
0

We had the same problem in a project when migrating from Spark 1.x and HDFS to Spark 3.x and S3. We solve this issue setting the next Spark property to false:

spark.sql.hive.convertMetastoreParquet

You can just run

spark.sql("SET spark.sql.hive.convertMetastoreParquet=false")

Or maybe

spark.conf("spark.sql.hive.convertMetastoreParquet", False)

Being spark the SparkSession object. The explanaition of this is currently in Spark documentation.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.