Having issues reading a .csv file python-pandas

Question

I'm trying to read this .txt file in pandas and this is my result. I thought (naively) that I was getting a hang of this stuff last night, but I'm wrong apparently. If I simply run

rebull = pd.read_table('rebull.txt',sep=' ')

it works, but it gives my result with a disordered array of NaN's I assume from the separations in the initial .txt RESULT

waitingkuo · Accepted Answer · 2013-05-17 02:55:47Z

Try skipinitialspace:

In [26]: pd.read_table('test.txt', sep=' ', skipinitialspace=True)
Out[26]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 386 entries, 0 to 385 
Data columns (total 7 columns):
Mon          386  non-null values
id           386  non-null values
NA           386  non-null values
alpha_K24    386  non-null values
class        386  non-null values
alpha_K8     386  non-null values
class.1      0  non-null values
dtypes: float64(3), object(4)

EDIT

Sorry for misunderstanding your problem. I think you can read the table as @DSM mentioned and also set the column names

In [55]: pd.read_table('test.txt', sep=r"\s\s+", header=None, skiprows=[0], names=['Mon id', 'Na', 'alpha_K24', 'class', 'alpha_8', 'class'])
Out[55]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 386 entries, 0 to 385
Data columns (total 6 columns):
Mon id       386  non-null values
Na           386  non-null values
alpha_K24    386  non-null values
class        386  non-null values
alpha_8      386  non-null values
class        386  non-null values
dtypes: float64(2), object(4)

Note that you might set your second class as another name. Or you'll get two columns by df['class']

+1 for actually doing it, but I think I'd use r"\s+" here. The \s\s+ was a hypothetical which would have worked to keep "Mon id" together if everything (columns included) were separated by two or more spaces, which unf. isn't the case. Here it works okay on the data itself, but it doesn't offer any advantage because we don't have to let \s itself pass.

Matt · Accepted Answer · 2013-05-16 20:04:42Z

0

Figured out my problem...always confirm your indices are joined by hyphens if necessary. In particular my 'Mon id' in the first column was my problem...should be 'Mon-id'.

answered May 16, 2013 at 20:04

Matt

3,5676 gold badges45 silver badges66 bronze badges

4 Comments

DSM Over a year ago

You didn't need column names not to have spaces, but you did need to join them by something you could separate. The space between "Mon" and "id" was hard to distinguish from the spaces between the columns. Sometimes I've found it easier to do something like pd.read_table("gistfile1.txt", sep=r"\s+", skiprows=1, header=None) and fix the columns after the fact.

Matt Over a year ago

@DSM - Cool. That's good to know. I'm really new to all of this so it's a step-by-step process. Thanks for the input!

DSM Over a year ago

It would even have worked if the other column names were separated by multiple spaces with sep=r"\s\s+" (i.e. only separate on one or more). You just got unlucky. :^)

Andy Hayden Over a year ago

@DSM do you want to make that an answer? (Changing the csv is probably not desired.)

Collectives™ on Stack Overflow

Having issues reading a .csv file python-pandas

2 Answers 2

EDIT

3 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

EDIT

3 Comments

4 Comments

Related