0

I am trying to read a pipe delimited text file in pyspark dataframe into separate columns but I am unable to do so by specifying the format as 'text'. It works fine when I give the format as csv.

This code is what I think is correct as it is a text file but all columns are coming into a single column.

df = spark.read.format('text').options(header=True).options(sep='|').load("path\\test.txt")

df.show()

+--------------------+
|               value|
+--------------------+
|Name|Color|Size|O...|
|Rabbit|Brown|7|Wa...|
| Horse|Green|28|Dock|
|  Pig|Orange|17|Port|
|Cow|Blue|23|Wareh...|
|  Bird|Yellow|2|Dock|
|   Dog|Brown|10|Port|
|Carrot Man|Orange...|
+--------------------+

This piece of code is working correctly by splitting the data into separate columns but I have to give the format as csv even though the file is actually .txt.

df = spark.read.format('csv').options(header=True).options(sep='|').load("path\\test.txt")

df.show()

+----------+------+----+---------+
|      Name| Color|Size|   Origin|
+----------+------+----+---------+
|    Rabbit| Brown|   7|Warehouse|
|     Horse| Green|  28|     Dock|
|       Pig|Orange|  17|     Port|
|       Cow|  Blue|  23|Warehouse|
|      Bird|Yellow|   2|     Dock|
|       Dog| Brown|  10|     Port|
|Carrot Man|Orange|  22|Warehouse|
+----------+------+----+---------+
1

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.