10

I have an xml document that looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Position>
    <Search>
        <Location>
            <Region>OH</Region>
            <Country>us</Country>
            <Longitude>-816071</Longitude>
            <Latitude>415051</Latitude>
        </Location>
    </Search>
</Position>

I read it into a dataframe:

df = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Position').load('1.xml')

I can see 1 column:

df.columns
['Search']

print df.select("Search")
DataFrame[Search: struct<Location:struct<Country:string,Latitude:bigint,Longitude:bigint,Region:string>>]

How do I access the nested columns. ex Location.Region?

2
  • Can you post a sample row of the dataframe that you get. Commented Feb 15, 2017 at 4:22
  • This was very helpful thankyou Commented Feb 8, 2018 at 20:42

1 Answer 1

14

you can do something like below:

df.select("Search.Location.*").show()

output:

+-------+--------+---------+------+
|Country|Latitude|Longitude|Region|
+-------+--------+---------+------+
|     us|  415051|  -816071|    OH|
+-------+--------+---------+------+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.