6

I have a big (8 GB) csv gzip file. I would like to read it through pandas into a DataFrame. Since the length of the file is big, I read it in chunks and it works fine but I'm interested in knowing whether is there a way to read only the last x lines, without decompressing the whole file.

2
  • You might be interested in reading about HDF5 as a file format for your data. (Yes, of course it's supported by pandas.) Commented Mar 1, 2015 at 20:20
  • 1
    I can't see how it cannot decompress the whole fie, if you know the number of rows you could see what happens if you set skiprows=some_num as a param to read_csv Commented Mar 1, 2015 at 20:34

1 Answer 1

2

I am thinking of various ways to read the last lines of a dataframe. As I am not sure if I understood what you mean by "without decompressing the whole file" correctly, I wonder if any of the options bellow is of interest to you.


Option 1

When reading a .csv file using pandas.read_csv(), rows can be skipped over so they are not included in the import.

For that, when calling it one should pass skiprows=[x], where x is the row number to be excluded (Note that row numbering is list-like, beginning with 0).


Option 2

Another option might be converting the file to HDF5 and select a start and stop. Here's an example

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Date' : np.random.randn(50000)},index=pd.date_range('20200528',periods=50000,freq='s'))

store = pd.HDFStore('example.h5', mode='w')

store.append('df', df)

rowsnumber = store.get_storer('df').nrows

store.select('df',start=nrows-5,stop=rowsnumber) #Change the start to the number of rows one wants to display starting from the end

Option 3

Assuming that the df is already associated with the variable df, in order to read the last 5 rows, use df.iloc

rows = df.iloc[-5:]

Or df.tail

rows = df.tail(5)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.