pandas.read_csv: how to skip comment lines

Question

I think I misunderstand the intention of read_csv. If I have a file 'j' like

# notes
a,b,c
# more notes
1,2,3

How can I pandas.read_csv this file, skipping any '#' commented lines? I see in the help 'comment' of lines is not supported but it indicates an empty line should be returned. I see an error

df = pandas.read_csv('j', comment='#')

CParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3

I'm currently on

In [15]: pandas.__version__
Out[15]: '0.12.0rc1'

On version'0.12.0-199-g4c8ad82':

In [43]: df = pandas.read_csv('j', comment='#', header=None)

CParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3

What is 'j'? I'm unable to reproduce the error when replacing 'j' with csv file path. — Bryan
– Bryan, Commented Aug 21, 2013 at 20:17
Sorry, b'#' was a typo. 'j' is an example file. It is a bug as Andy Hayden mentions below. — safetyduck
– safetyduck, Commented Aug 21, 2013 at 20:29
@mathtick weirdly I get slightly different error with the above code, but I've posted an issue with the CParserError you describe on github, I think it's a bug. — Andy Hayden
– Andy Hayden, Commented Aug 21, 2013 at 20:34
@AndyHayden ... yes, I grabbed the error from a loading a different file than shown in the example when I was in a rush. Just tried to reproduce it at home and discovered that the behavoiur appears to have already changed slightly the newer versions (tested on '0.12.0-199-g4c8ad82'). I've updated the example. — safetyduck
– safetyduck, Commented Aug 22, 2013 at 0:13

hlin117 · Accepted Answer · 2015-08-11 00:34:23Z

85

So I believe in the latest releases of pandas (version 0.16.0), you could throw in the comment='#' parameter into pd.read_csv and this should skip commented out lines.

These github issues shows that you can do this:

See the documentation on read_csv: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

edited Aug 11, 2015 at 0:34

answered Aug 11, 2015 at 0:19

hlin117

22.5k32 gold badges77 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Atreyagaurav Over a year ago

What happens if I have more than 1 comment character? I found a data file which is a dump from java and it has # for comment lines and @ for block names and I want to skip both. In gnuplot I could set #@ as comment chars and it skipped both but in pandas it gives a error saying only single character is allowed.

Andy Hayden · Accepted Answer · 2013-08-21 21:10:06Z

One workaround is to specify skiprows to ignore the first few entries:

In [11]: s = '# notes\na,b,c\n# more notes\n1,2,3'

In [12]: pd.read_csv(StringIO(s), sep=',', comment='#', skiprows=1)
Out[12]: 
    a   b   c
0 NaN NaN NaN
1   1   2   3

Otherwise read_csv gets a little confused:

In [13]: pd.read_csv(StringIO(s), sep=',', comment='#')
Out[13]: 
        Unnamed: 0
a   b            c
NaN NaN        NaN
1   2            3

This seems to be the case in 0.12.0, I've filed a bug report.

As Viktor points out you can use dropna to remove the NaN after the fact... (there is a recent open issue to have commented lines be ignored completely):

In [14]: pd.read_csv(StringIO(s2), comment='#', sep=',').dropna(how='all')
Out[14]: 
   a  b  c
1  1  2  3

Note: the default index will "give away" the fact there was missing data.

And also since you don't want the comment lines just call .dropna(how='all').reset_index(drop=True) after.

Max von Hippel · Accepted Answer · 2017-07-15 00:14:47Z

5

I am on Pandas version 0.13.1 and this comments-in-csv problem still bothers me.

Here is my present workaround:

def read_csv(filename, comment='#', sep=','):
    lines = "".join([line for line in open(filename) 
                     if not line.startswith(comment)])
    return pd.read_csv(StringIO(lines), sep=sep)

Otherwise with pd.read_csv(filename, comment='#') I get

pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 16, saw 3.

edited Jul 15, 2017 at 0:14

Max von Hippel

2,9683 gold badges31 silver badges47 bronze badges

answered May 12, 2014 at 9:09

Finn Årup Nielsen

6,8841 gold badge35 silver badges45 bronze badges

Collectives™ on Stack Overflow

pandas.read_csv: how to skip comment lines

3 Answers 3

1 Comment

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Linked

Related