Pandas - Read end of .csv file

Question

I have a big (8 GB) csv gzip file. I would like to read it through pandas into a DataFrame. Since the length of the file is big, I read it in chunks and it works fine but I'm interested in knowing whether is there a way to read only the last x lines, without decompressing the whole file.

You might be interested in reading about HDF5 as a file format for your data. (Yes, of course it's supported by pandas.) — Carsten
– Carsten, Commented Mar 1, 2015 at 20:20
I can't see how it cannot decompress the whole fie, if you know the number of rows you could see what happens if you set skiprows=some_num as a param to read_csv — EdChum
– EdChum, Commented Mar 1, 2015 at 20:34

Gonçalo Peres · Accepted Answer · 2021-05-28 14:00:49Z

I am thinking of various ways to read the last lines of a dataframe. As I am not sure if I understood what you mean by "without decompressing the whole file" correctly, I wonder if any of the options bellow is of interest to you.

Option 1

When reading a .csv file using pandas.read_csv(), rows can be skipped over so they are not included in the import.

For that, when calling it one should pass skiprows=[x], where x is the row number to be excluded (Note that row numbering is list-like, beginning with 0).

Option 2

Another option might be converting the file to HDF5 and select a start and stop. Here's an example

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Date' : np.random.randn(50000)},index=pd.date_range('20200528',periods=50000,freq='s'))

store = pd.HDFStore('example.h5', mode='w')

store.append('df', df)

rowsnumber = store.get_storer('df').nrows

store.select('df',start=nrows-5,stop=rowsnumber) #Change the start to the number of rows one wants to display starting from the end

Option 3

Assuming that the df is already associated with the variable df, in order to read the last 5 rows, use df.iloc

rows = df.iloc[-5:]

Or df.tail

rows = df.tail(5)

Collectives™ on Stack Overflow

Pandas - Read end of .csv file

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related