1

I have a two-dimensional numpy.ndarray of floats. Each row is to be converted to a string consisting of 1s and 0s, reflecting whether the elements of the row satisfy a given property or not.
In this question, I will show my approach (which works), then explain why I find it unsatisfactory, and then ask for your advice.

My approach so far:

import numpy as np

threshold = 0.1

## This array serves as an example. In my actual code, it
## is bigger: something like shape=(30000, 5).
## That is, 30000 rows and 5 columns. Both numbers will
## from case to case.
test_array  = np.array(
                [[0.5,0.2,0.0,0.0,0.3],
                 [0.8,0.0,0.0,0.0,0.2],
                 [0.8,0.0,0.1,0.0,0.1],
                 [1.0,0.0,0.0,0.0,0.0],
                 [0.9,0.0,0.0,0.1,0.0],
                 [0.1,0.0,0.0,0.8,0.1],
                 [0.0,0.1,0.0,0.0,0.9],
                 [0.0,0.0,0.0,0.0,1.0],
                 [0.0,0.0,0.5,0.5,0.0],
                ],
                dtype=float
            )

## Now comes the conversion in two steps.
test_array_2 = np.where(test_array > threshold, '1', '0')
test_array_3 = np.apply_along_axis(''.join, 1, test_array_2)

The in-between result test_array_2 eveluates to

array([['1', '1', '0', '0', '1'],
       ['1', '0', '0', '0', '1'],
       ['1', '0', '0', '0', '0'],
       ['1', '0', '0', '0', '0'],
       ['1', '0', '0', '0', '0'],
       ['0', '0', '0', '1', '0'],
       ['0', '0', '0', '0', '1'],
       ['0', '0', '0', '0', '1'],
       ['0', '0', '1', '1', '0']], dtype='<U1')

and test_array_3 evaluates to

array(['11001', '10001', '10000', '10000', '10000', '00010', '00001',
       '00001', '00110'], dtype='<U5')

test_array_3 is the result I want.

Why this is unsatisfactory: I dislike my use of the str.join() method. Maybe it is because I'm unexperienced, but it feels like it makes the code less readable.
Also (maybe the more important point), the function np.apply_along_axis is not efficient. It would be better to vectorize the computation, right?

Question: Is the use of str.join() a bad choice and, if so, what other methods are there?
Is there a good way to vectorize the computation?

2
  • You say: In my actual code, it is bigger. What are the actual dimensions? Commented Nov 20, 2021 at 19:51
  • 1
    @Armali It will be roughly 30.000 rows and about 5 columns. So the strings I want to get will have a length similar to length 5, as indicated in my question. The number of strings will be much larger. Commented Nov 22, 2021 at 8:05

1 Answer 1

0

I don't know whether you find this satisfactory or readable, but at least it doesn't use str.join nor np.apply_along_axis:

width = test_array.shape[1]
bytes = np.packbits(test_array > threshold, 1) >> 8-width
array = np.frompyfunc(np.binary_repr, 2, 1)(bytes.flatten(), width)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the the idea of using np.packbits. Where can I find documentation on the >>? I didn't find any so far.
The documentation of numpy.right_shift says: The >> operator can be used as a shorthand for np.right_shift on ndarrays.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.