1

I am trying to create a Pandas Dataframe from a string using the following code -

import pandas as pd

input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""

data = input_string
df = pd.DataFrame([x.split(';') for x in data.split('\n')])
print(df)

I am getting the following result -

    0     1     2
 0  A     B     C
 1  0    34    88
 2  2    45   200
 3  3    47    65
 4  4    32   140
 5     None  None

But I need something like the following -

 A     B     C
 0    34    88
 2    45   200
 3    47    65
 4    32   140

I added "index = False" while creating the dataframe like -

df = pd.DataFrame([x.split(';') for x in data.split('\n')],index = False)

But, it gives me an error -

TypeError: Index(...) must be called with a collection of some kind, False 
was passed

How is this achievable?

1
  • So you wanted to read the column names from a header row, and also to set 'A' as the index. (And to not read an empty ghost row 5.) You can do all this with pd.read_csv() and wrap your input as a io.StringIO. Commented Nov 8, 2023 at 3:45

2 Answers 2

1

Use read_csv with StringIO and index_col parameetr for set first column to index:

input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""

df = pd.read_csv(pd.compat.StringIO(input_string),sep=';', index_col=0)
print (df)
    B    C
A         
0  34   88
2  45  200
3  47   65
4  32  140

Your solution should be changed with split by default parameter (arbitrary whitespace), pass to DataFrame all values of lists without first with columns parameter and if need first column to index add DataFrame.set_axis:

L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index('A')
print (df)
    B    C
A         
0  34   88
2  45  200
3  47   65
4  32  140

For general solution use first value of first list in set_index:

L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index(L[0][0])

EDIT:

You can set column name instead index name to A value:

df = df.rename_axis(df.index.name, axis=1).rename_axis(None)
print (df)
A   B    C
0  34   88
2  45  200
3  47   65
4  32  140
Sign up to request clarification or add additional context in comments.

2 Comments

'B' and 'C' columns are there in the first row. 'A' comes in the second row. I want all three in one row.
@TeeKay - It is possible, but a bit hack. Check eited answer.
0
import pandas as pd

input_string="""A;B;C 
0;34;88
2;45;200
3;47;65
4;32;140
"""

data = input_string
df = pd.DataFrame([x.split(';') for x in data.split()])
df.columns = df.iloc[0]
df = df.iloc[1:].rename_axis(None, axis=1)
df.set_index('A',inplace = True)
df

output

    B   C
A       
0   34  88
2   45  200
3   47  65
4   32  140

5 Comments

Any way to remove the first column as well? Like I mentioned in the question. Column 'A' is the index for me
df.set_index('A')
I have updated the answer . I hope it works for you now.
@TeeKay can you mark the answer if it was useful for you. Let me know if it worked?
Yes, this was helpful.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.