Add string column to float matrix NumPy

Question

I'm looking for a method to add a column of float values to a matrix of string values.

Mymatrix = 
[["a","b"],
 ["c","d"]]

I need to have a matrix like this =

[["a","b",0.4],
 ["c","d",0.6]]

You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame. — javidcf
– javidcf, Commented Nov 13, 2018 at 10:59

Alex · Accepted Answer · 2018-11-13 11:12:41Z

2

I would suggest using a pandas DataFrame instead:

import pandas as pd

df = pd.DataFrame([["a","b",0.4],
                   ["c","d",0.6]])

print(df)

   0  1    2
0  a  b  0.4
1  c  d  0.6

You can also specify column (Series) names:

df = pd.DataFrame([["a","b",0.4],
                   ["c","d",0.6]], columns=['A', 'B', 'C'])
df
   A  B    C
0  a  b  0.4
1  c  d  0.6

answered Nov 13, 2018 at 11:12

Alex

7,1554 gold badges27 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kcw78 · Accepted Answer · 2018-11-13 16:19:49Z

As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):

Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])

dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)

Resulting output looks like this:

[('a', 'b',  0.4) ('c', 'd',  0.6)]

From there, use formatted strings to print.
You can also copy from a recarray to an ndarray if you reverse assignment order in my example.
Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?

B. M. · Accepted Answer · 2018-11-13 12:49:16Z

You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:

raw=[["a","b",0.4],
["c","d",0.6]]

dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

aligned=ndarray(len(raw),dt)

for i in range (len(raw)):
    for j in range (len(dt)):
        aligned[i][j]=raw[i][j]

You can also use pandas, but you loose often some performance.

Collectives™ on Stack Overflow

Add string column to float matrix NumPy

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related