1

I'm trying to read this .txt file in pandas and this is my result. I thought (naively) that I was getting a hang of this stuff last night, but I'm wrong apparently. If I simply run

rebull = pd.read_table('rebull.txt',sep=' ')

it works, but it gives my result with a disordered array of NaN's I assume from the separations in the initial .txt RESULT

2 Answers 2

2

Try skipinitialspace:

In [26]: pd.read_table('test.txt', sep=' ', skipinitialspace=True)
Out[26]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 386 entries, 0 to 385 
Data columns (total 7 columns):
Mon          386  non-null values
id           386  non-null values
NA           386  non-null values
alpha_K24    386  non-null values
class        386  non-null values
alpha_K8     386  non-null values
class.1      0  non-null values
dtypes: float64(3), object(4)

EDIT

Sorry for misunderstanding your problem. I think you can read the table as @DSM mentioned and also set the column names

In [55]: pd.read_table('test.txt', sep=r"\s\s+", header=None, skiprows=[0], names=['Mon id', 'Na', 'alpha_K24', 'class', 'alpha_8', 'class'])
Out[55]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 386 entries, 0 to 385
Data columns (total 6 columns):
Mon id       386  non-null values
Na           386  non-null values
alpha_K24    386  non-null values
class        386  non-null values
alpha_8      386  non-null values
class        386  non-null values
dtypes: float64(2), object(4)

Note that you might set your second class as another name. Or you'll get two columns by df['class']

Sign up to request clarification or add additional context in comments.

3 Comments

You'd still need to patch the columns.
Oops, I forgot there's a space in the first column name.
+1 for actually doing it, but I think I'd use r"\s+" here. The \s\s+ was a hypothetical which would have worked to keep "Mon id" together if everything (columns included) were separated by two or more spaces, which unf. isn't the case. Here it works okay on the data itself, but it doesn't offer any advantage because we don't have to let \s itself pass.
0

Figured out my problem...always confirm your indices are joined by hyphens if necessary. In particular my 'Mon id' in the first column was my problem...should be 'Mon-id'.

4 Comments

You didn't need column names not to have spaces, but you did need to join them by something you could separate. The space between "Mon" and "id" was hard to distinguish from the spaces between the columns. Sometimes I've found it easier to do something like pd.read_table("gistfile1.txt", sep=r"\s+", skiprows=1, header=None) and fix the columns after the fact.
@DSM - Cool. That's good to know. I'm really new to all of this so it's a step-by-step process. Thanks for the input!
It would even have worked if the other column names were separated by multiple spaces with sep=r"\s\s+" (i.e. only separate on one or more). You just got unlucky. :^)
@DSM do you want to make that an answer? (Changing the csv is probably not desired.)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.