I have a two-dimensional numpy.ndarray of floats. Each row is to be converted to a string consisting of 1s and 0s, reflecting whether the elements of the row satisfy a given property or not.
In this question, I will show my approach (which works), then explain why I find it unsatisfactory, and then ask for your advice.
My approach so far:
import numpy as np
threshold = 0.1
## This array serves as an example. In my actual code, it
## is bigger: something like shape=(30000, 5).
## That is, 30000 rows and 5 columns. Both numbers will
## from case to case.
test_array = np.array(
[[0.5,0.2,0.0,0.0,0.3],
[0.8,0.0,0.0,0.0,0.2],
[0.8,0.0,0.1,0.0,0.1],
[1.0,0.0,0.0,0.0,0.0],
[0.9,0.0,0.0,0.1,0.0],
[0.1,0.0,0.0,0.8,0.1],
[0.0,0.1,0.0,0.0,0.9],
[0.0,0.0,0.0,0.0,1.0],
[0.0,0.0,0.5,0.5,0.0],
],
dtype=float
)
## Now comes the conversion in two steps.
test_array_2 = np.where(test_array > threshold, '1', '0')
test_array_3 = np.apply_along_axis(''.join, 1, test_array_2)
The in-between result test_array_2 eveluates to
array([['1', '1', '0', '0', '1'],
['1', '0', '0', '0', '1'],
['1', '0', '0', '0', '0'],
['1', '0', '0', '0', '0'],
['1', '0', '0', '0', '0'],
['0', '0', '0', '1', '0'],
['0', '0', '0', '0', '1'],
['0', '0', '0', '0', '1'],
['0', '0', '1', '1', '0']], dtype='<U1')
and test_array_3 evaluates to
array(['11001', '10001', '10000', '10000', '10000', '00010', '00001',
'00001', '00110'], dtype='<U5')
test_array_3 is the result I want.
Why this is unsatisfactory:
I dislike my use of the str.join() method. Maybe it is because I'm unexperienced, but it feels like it makes the code less readable.
Also (maybe the more important point), the function np.apply_along_axis is not efficient. It would be better to vectorize the computation, right?
Question:
Is the use of str.join() a bad choice and, if so, what other methods are there?
Is there a good way to vectorize the computation?
In my actual code, itis bigger.What are the actual dimensions?