Create pandas DataFrame with 3 text files

Question

Here is my case : I had 3 matrix from Matlab (X,Y,Z) of size (126,321) X is the x coordinates, Y y coordinates, and Z the efficiency of a machine depending of the coordinates X and Y. I want to use the matrix Z in python. So I saved Z in a text file. But before I transposed it and rotate it by 90° (because the matrix in Matlab was not the same representation than the figure). Then I saved the vector with the x coordinates in a text file And I saved the vector with the y coordinates in a text file.

So I have 3 text file: - text1.txt with size (126,321) (it is Z) - text2.txt which is a line with 126 values - text3.txt which is a line with 321 values

What I would like to do is to create a DataFrame with pandas with text1 the data, text 2 the index, text3 the header.

I did the following code:

Efficiency=pd.read_csv('text1.txt',sep=';',header=None,index_col=False)
x=pd.read_csv('text3.txt',sep=';',header=None,index_col=False)
y=pd.read_csv('text2.txt',sep=';',header=None,index_col=False)
Efficiency.columns=x
Efficiency.index=y

But the two last lines are not working. I tried to pass by numpy but the results are not good also.

So if you have any explanation or solution just tell me !

Thanks a lot.

Look into pandas concat function pandas.pydata.org/pandas-docs/stable/generated/… — Clock Slave
– Clock Slave, Commented Aug 23, 2017 at 7:27

Shihe Zhang · Accepted Answer · 2017-08-24 06:31:37Z

1

What you need is to make the one line of x,and one line of y,to become an Index. To change the index, reindex it.

Efficiency.reindex(index=x.iloc[0], columns=y.iloc[0])

Note:

A new object is produced unless the new index is equivalent to the current one and copy=False

answered Aug 24, 2017 at 6:31

Shihe Zhang

2,7815 gold badges40 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

2Obe · Accepted Answer · 2017-08-24 07:37:47Z

df1=pd.DataFrame(np.random.randint(0,100,126))

df2=pd.DataFrame(np.random.randint(322,1000,321))#The problem is that at least two columnn names are equal and thus it throws an error

You can investigate the duplicate values with this. This should work the same way for you

duplicates=df2.duplicated()
print(df2[duplicates])

     0
22   828
30   575
41   341
55   713
75   341
80   353
92   759
117  520
118  330
126  828
130  547
134  927
142  451
150  778
155  417

....

Bacause dropping values as well as changing values is not a option for you a convenient way is to use a multiindex where your x values are on the first level and the second level are numbers fom 0 to the number of your columns.

mcols=pd.MultiIndex.from_arrays([np.random.randint(322,1000,321),np.linspace(0,320,321)])

df3=pd.DataFrame(np.random.randint(0,100,size=(126,321)))# This ranom numbers should simulate your (126,321) DataFrame


df4=pd.DataFrame(df3.values,index=df1,columns=mcols)
print(df4)

.....

 868   679   757   464   420   381   843   549   978   450  ...    578  \
   0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0    ...  311.0   
47     7    73    78    98    41    62    48    65    35    26  ...     85   
68    54    40    61    75    24     9    15    25     1    35  ...     63   
89    44    30    48    95    27    11    52    41    87    31  ...     73   
57    61    46    11    88    21    58    80    42    99    65  ...     23   
37    70    88    32    95    46    66    93    37    88    95  ...     64   
38    14    19    63    73     0    53    71     4    20    63  ...     88   
60    71    87    18    30    94    30    32     9    32    82  ...     36   
15    87     8    57    68    24    95    26    47    29    29  ...      5   
77    70    54    82    31    85    27    13    13    66    16  ...      3   
10     1    28    64     2    75    22    20     9    93     0  ...     89   
60    26    62    81    13     8    18    40    15    13    47  ...     44   
35    24    42    16    68    45    73    96    81     3    44  ...     16   
81    63    30    19    81    99    81     9     9    34    37  ...     53

.....

With reference to Shihe Zhang you can directly set the index and the column names without reindexing it and without a multi index using:

df4=pd.DataFrame(df3.values,index=df1.iloc[:,0],columns=df2.iloc[:,0])

I did it but I received message error : Buffer has wrong number of dimensions (expected1, got2)
I used the following code at the end : df4=pd.DataFrame(df3,index=df1.loc[:,0],columns=df2.loc[:,0]) and it worked. Thanks !
@Nathan possibly you got problem like this stackoverflow.com/q/27065133/1278112 while,that's another probelm.
Thjs could be prevented by setting assuffix to each column name in case you want to merge two data frames and one of them hase dupe column names. This could be done for example with df.columns=[df.columns[i]+str(i) for i in range(len(df.columns))] therewith each columns definitely has another name but this will not be an option in this case becasue the values of the columns are not allowed to change. But as you said, this is another problem

Collectives™ on Stack Overflow

Create pandas DataFrame with 3 text files

2 Answers 2

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Linked

Related