Read all but last line of CSV file in pandas

Question

I have CSV files which I read in in pandas with:

#!/usr/bin/env python

import pandas as pd
import sys

filename = sys.argv[1]
df = pd.read_csv(filename)

Unfortunately, the last line of these files is often corrupt (has the wrong number of commas). Currently I open each file in a text editor and remove the last line.

Is it possible to remove the last line in the same python/pandas script that loads the CSV to save having to take this extra non-automated step?

You deleted a question about extracting numbers, anyway I was going to suggest using str.extract: for col in df.columns[2:]: df[col] = df[col].str.extract(r'(\d+)').astype(int) — EdChum
– EdChum, Commented Nov 13, 2015 at 9:55

Vladimir Fokow · Accepted Answer · 2023-07-05 19:19:09Z

40

Pass on_bad_lines='skip' and it will skip this line automatically

df = pd.read_csv(filename, on_bad_lines='skip')

The advantage of on_bad_lines='skip' is it will skip and not bork on any erroneous lines. But if the last line is always duff then skipfooter=1 is better.
Thanks to @DexterMorgan for pointing out that skipfooter option forces the engine to use the python engine which is slower than the c engine for parsing a csv.

and here is an old version (don't use - it is removed from pandas2.0):

df = pd.read_csv(filename, error_bad_lines=False)

Deprecated since version 1.3.0: The on_bad_lines parameter should be used instead to specify behavior upon encountering a bad line instead.

edited Jul 5, 2023 at 19:19

Vladimir Fokow

3,9032 gold badges7 silver badges32 bronze badges

answered Nov 13, 2015 at 9:43

EdChum

396k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

n1k31t4 Over a year ago

Regarding the skipfooter option, it might be good to know that it doesn't work with the dtypes option:

ValueError: Falling back to the 'python' engine because the 'c' engine does not support skipfooter, but this causes 'dtype' to be ignored as it is not supported by the 'python' engine. (Note the 'converters' option provides similar functionality.)

EdChum Over a year ago

@DexterMorgan sure will add

Chaoste Over a year ago

There's an option 'skiprows', which is supported with c engine. If you know the number of lines of your csv you could add it as follows: pd.read_csv(filename, skiprows=[999]) (In my case there are 1000 lines) - note that you have to define rows in a list if you want to specify rows given their line number.

EdChum Over a year ago

@Chaoste but the bad rows are at the end though, wouldn't you want nrows instead?

Chaoste Over a year ago

@EdChum I'm just looking into the documentation because I need it right now and I didn't see this option until now. Thank you! So In my case instead of skiprows=[1000] I had to write nrows=999. Another solution could be removing the last line via the command line which is very fast: head -n -1 dataframe.csv > temp.csv && mv temp.csv dataframe.csv

|

drops · Accepted Answer · 2020-08-07 09:10:14Z

24

You can leave out the last n lines when reading in a csv by using the skipfooter argument:

df = pd.read_csv(filename, skipfooter=3, engine='python')

In this example the last 3 lines are ommited.

answered Aug 7, 2020 at 9:10

drops

1,6141 gold badge15 silver badges21 bronze badges

Comments

Mangu Singh Rajpurohit · Accepted Answer · 2015-11-13 09:43:12Z

11

Read http://pandas.pydata.org/pandas-docs/version/0.16.2/generated/pandas.read_csv.html. Here 'skipfooter' argument can be used to specify no of lines that you don't want to read from .csv file from the end. May be It may help you.

answered Nov 13, 2015 at 9:43

Mangu Singh Rajpurohit

11.5k4 gold badges76 silver badges101 bronze badges

Collectives™ on Stack Overflow

Read all but last line of CSV file in pandas

3 Answers 3

and here is an old version (don't use - it is removed from pandas2.0):

6 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

and here is an old version (don't use - it is removed from pandas2.0):

6 Comments

Comments

Comments

Linked

Related