2

I am trying to access pandas dataframe by column names after indexing the df with a specific column and it returns incorrect column values.

import pandas as pd
rs =pd.read_csv('rs.txt', header="infer", sep="\t",  names=['id', 'exp','fov','cycle', 'color', 'values'], index_col=2)

rs.cycle.head()

I am indexing the df here with 'fov' and I want to access the 'cycle' column, it gives me the color column instead. I think I am missing something here?


EDIT The first few lines of the input file are:

6 3 1 G 0.96593 
6 3 1 O 0.88007 
6 3 1 R 0.94305 
6 3 2 B 0.90554 
6 3 2 G 0.93146
4
  • 3
    Can you please post the first few lines of rs.txt? Commented Mar 6, 2013 at 18:22
  • @mbatchkarov, Here are few lines from rs.txt ` 6 3 1 G 0.96593 6 3 1 O 0.88007 6 3 1 R 0.94305 6 3 2 B 0.90554 6 3 2 G 0.93146` Commented Mar 6, 2013 at 18:43
  • I added the sample data to your original question. Can you check if I've put the line breaks at the right places? Commented Mar 6, 2013 at 19:29
  • @mbatchkarov, yes thanks, this is the correct format, it wouldn't let me post in the above format after 5 edits! Commented Mar 6, 2013 at 19:38

1 Answer 1

2

I think the problem arises because your data file has 5 columns and your names list has 6 elements. To verify, check the first few values in the id column- these will all be set to 6 if I am right. The First few items in the exp column will have the value 3.

To fix this, read your input file like so:

rs =pd.read_csv('rs.txt', header="infer", sep="\t",  names=['exp','fov','cycle', 'color', 'values'], index_col=2

Pandas will automatically insert row identifiers.

Sign up to request clarification or add additional context in comments.

3 Comments

there are in fact 6 columns in the file, the first column is empty corresponding to 'id' for downstream purposes. I missed pointing that in my earlier comment.
I still think pandas is not correctly handling your empty column and you end up with either with 5 columns, or with 6 columns, but shifted one to the left. Please post the output of print rs.columns and print rs
I identified the problem, there was an invisible tab at the end of the file that was causing this problem. I added a place holder in the names for the last tab and it works as expected now. Thanks for your suggestions :).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.