2

I've a very simple question: which is the most efficient way to read different entries from a txt file with Python?

Suppose I've a text file like:

42017     360940084.621356  21.00  09/06/2015  13:08:04
42017     360941465.680841  29.00  09/06/2015  13:31:05
42017     360948446.517761  16.00  09/06/2015  15:27:26
42049     361133954.539315  31.00  11/06/2015  18:59:14
42062     361208584.222483  10.00  12/06/2015  15:43:04
42068     361256740.238150  19.00  13/06/2015  05:05:40

In C I would do:

while(fscanf(file_name, "%d %lf %f %d/%d/%d %d:%d:%d", &id, &t0, &score, &day, &month, &year, &hour, &minute, &second) != EOF){...some instruction...}

What would be the best way to do something like this in Python? In order to store every value into a different variable (since I've got to work with those variables throughout the code).

Thanks in advance!

3

3 Answers 3

2

I feel like the muddyfish answer is good, here is another way (maybe a bit lighter)

import time
with open(file) as f:
    for line in f:
        identifier, t0, score, date, hour = line.split()

        # You can also get a time_struct from the time
        timer = time.strptime(date + hour, "%d/%m/%Y%H:%M:%S")
Sign up to request clarification or add additional context in comments.

3 Comments

note that id is a reserved word. If you want to use it as an identifier, use id_ = value instead
Thanks FunkySayu! I also ended up to something similar... since I need each single entry (day, month, year, etc.), I was wondering whether there is a faster way or do I have to use line.split("/") and line.split(":") another time?
The point is that I've got to work with each single entry (like make operations with the t0 and the different days and months), so I need to store data into different variables
0

I would look up the string.split() method

For example you could use

for line in file.readlines():
    data = dict(zip(("id", "t0", "score", "date", "time"), line.split(" ")))
    instructions()

Comments

0

Depending on what you want to do with the data, pandas may be something to look into:

import pandas as pd

with open(file_name) as infile:
    df = pd.read_fwf(infile, header=None, parse_dates=[[3, 4]], 
        date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M:%S'))

The double list [[3, 4]], together with the date_parser argument, will read the the third and fourth (0-indexed) columns as a single data-time object. You can then access individual parts of that column with

>>> df['3_4'].dt.hour
0    13
1    13
2    15
3    18
4    15
5     5
dtype: int64

(If you don't like the '3_4' key, use the parse_dates argument above as follows:

parse_dates={'timestamp': [3, 4]}

)

read_fwf is for reading fixed width columns, which your data seems to adhere to. Alternatively, there are functions such as read_csv, read_table and a lot more.

(This answer is pretty much a duplicate of this SO answer, but since this question here is more general, I've put this here as another answer, not as a comment.)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.