0

I am trying to read in a text file. The file contains among others the following input:

DE  01945   Ruhland Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4576 13.8664 4
DE  01945   Tettau  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4333 13.7333 4
DE  01945   Grünewald   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    14  4
DE  01945   Guteborn    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4167 13.9333 4
DE  01945   Kroppen Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.3833 13.8    4
DE  01945   Schwarzbach Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.45   13.9333 4
DE  01945   Hohenbocka  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.431  14.0098 4
DE  01945   Lindenau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    13.7333 4
DE  01945   Hermsdorf   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4055 13.8937 4
DE  01968   Senftenberg Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5252 14.0016 4
DE  01968   Schipkau Hörlitz    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5299 13.9508 
DE  01968   Schipkau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5456 13.9121 4
DE  01979   Lauchhammer Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4881 13.7662 4

My code looks like this.

import pandas as pd

data = pd.read_csv('DE.txt', sep=" ", header=None)

Currently I am getting the following error that I can't get past:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 11, saw 3

I think this is due to the two-part city name, how can I read the text file correctly?

2
  • 1
    your delimiter is not constant. It is \t and " ". you need to parse each line and create dataframe. Commented Nov 18, 2021 at 14:35
  • 2
    try sep="\t"? Commented Nov 18, 2021 at 14:35

1 Answer 1

1

You have to read the file normally and parse everything to a dictionary and then create the dataframe.

import pandas as pd

file = open("DE.txt", "r")
lines = file.readlines()
dict = {}
for line in lines:
    //Create your own dictionary as you want to be created using the value in each line and store it in dict
df = pd.DataFrame(data=dict)

Or you can create a 2 dimensional list instead of a dictionary, if this is easier for you, and create the dataframe in the same way.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.