11

I have an array as following:

In [1]: x = array(['1.2', '2.3', '1.2.3'])

I want to test if each element in the array can be converted into numerical value. That is, a function: is_numeric(x) will return a True/False array as following:

In [2]: is_numeric(x)
Out[2]: array([True, True, False])

How to do this?

3
  • Possible duplicate of How do I check if a string is a number (float) in Python? Commented Jun 23, 2016 at 16:01
  • 4
    @farhan3: Not a duplicate. The appropriate methods for working with a NumPy array are almost always quite different from the appropriate methods for working with individual ordinary Python objects. Commented Jun 23, 2016 at 16:07
  • 1
    Actually that other question is quite useful. No one has come up with a way of bypassing the iterative application of a single string test. Commented Jun 23, 2016 at 20:39

5 Answers 5

6
import numpy as np

def is_float(val):
        try:
            float(val)
        except ValueError:
            return False
        else:
            return True

a = np.array(['1.2', '2.3', '1.2.3'])

is_numeric_1 = lambda x: map(is_float, x)              # return python list
is_numeric_2 = lambda x: np.array(map(is_float, x))    # return numpy array
is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array

Depend on the size of a array and the type of the returned values, these functions have different speed.

In [26]: %timeit is_numeric_1(a)
100000 loops, best of 3: 2.34 µs per loop

In [27]: %timeit is_numeric_2(a)
100000 loops, best of 3: 3.13 µs per loop

In [28]: %timeit is_numeric_3(a)
100000 loops, best of 3: 6.7 µs per loop

In [29]: a = np.array(['1.2', '2.3', '1.2.3']*1000)

In [30]: %timeit is_numeric_1(a)
1000 loops, best of 3: 1.53 ms per loop

In [31]: %timeit is_numeric_2(a)
1000 loops, best of 3: 1.6 ms per loop

In [32]: %timeit is_numeric_3(a)
1000 loops, best of 3: 1.58 ms per loop

If list is okay, use is_numeric_1.

If you want a numpy array, and size of a is small, use is_numeric_2.

Else, use is_numeric_3

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! The is_numeric_3 function speeds up the computation by ~10% in my test. I am wondering if the is_float function can be written in C, and use Cython to further speed up the computation?
stackoverflow.com/a/25299619/901925 claims to have a fast isfloat function.
2
In [23]: x = np.array(['1.2', '2.3', '1.2.3', '1.2', 'foo'])

Trying to convert the whole array to float, results in an error if one or more strings can't be converted:

In [24]: x.astype(float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-a68fda2cafea> in <module>()
----> 1 x.astype(float)

ValueError: could not convert string to float: '1.2.3'

In [25]: x[:2].astype(float)
Out[25]: array([ 1.2,  2.3])

But to find which ones can be converted, and which can't, we probably have to apply a test to each element. That requires some sort of iteration, and some sort of test.

Most of these answers have wrapped float in a try/except block. But look at How do I check if a string is a number (float) in Python? for alternatives. One answer found that the float wrap was fast for valid inputs, but a regex test was faster for invalid ones (https://stackoverflow.com/a/25299619/901925).

In [30]: def isnumeric(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

In [31]: [isnumeric(s) for s in x]
Out[31]: [True, True, False, True, False]

In [32]: np.array([isnumeric(s) for s in x])  # for array
Out[32]: array([ True,  True, False,  True, False], dtype=bool)

I like list comprehension because it is common and clear (and preferred in Py3). For speed I have found that frompyfunc has a modest advantage over other iterators (and handles multidimensional arrays):

In [34]: np.frompyfunc(isnumeric, 1,1)(x)
Out[34]: array([True, True, False, True, False], dtype=object)

In [35]: np.frompyfunc(isnumeric, 1,1)(x).astype(bool)
Out[35]: array([ True,  True, False,  True, False], dtype=bool)

It requires a bit more boilerplate than vectorize, but is usually faster. But if the array or list is small, list comprehension is usually faster (avoiding numpy overhead).

======================

(edited) np.char has a set of functions that apply string methods to the elements of an array. But the closest function is np.char.isnumeric which just tests for numeric characters, not a full float conversion.

1 Comment

isnumeric does something completely different from testing whether a string can be interpreted as a number.
1

I find the following works well for my purpose.

First, save the isNumeric function from https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#C in a file called ctest.h, then create a .pyx file as follows:

from numpy cimport ndarray, uint8_t
import numpy as np
cimport numpy as np

cdef extern from "ctest.h":
     int isNumeric(const char * s)

def is_numeric_elementwise(ndarray x):
    cdef Py_ssize_t i
    cdef ndarray[uint8_t, mode='c', cast=True] y = np.empty_like(x, dtype=np.uint8)

    for i in range(x.size):
        y[i] = isNumeric(x[i])

    return y > 0

The above Cython function runs quite fast.

In [4]: is_numeric_elementwise(array(['1.2', '2.3', '1.2.3']))
Out[4]: array([ True,  True, False], dtype=bool)

In [5]: %timeit is_numeric_elementwise(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 695 ms per loop

Compare with is_numeric_3 method in https://stackoverflow.com/a/37997673/4909242, it is ~5 times faster.

In [6]: %timeit is_numeric_3(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 3.45 s per loop

There might still be some rooms to improve, I guess.

Comments

0
# method to check whether a string is a float
def is_numeric(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

# method to return an array of booleans that dictate whether a string can be parsed into a number
def is_numeric_array(arr):
    return_array = []
    for val in numpy.ndenumerate(arr):
        return_array.append(is_numeric(val))
    return return_array

Comments

0

This also relies on the try-except method of getting the per-element result, but using fromiter pre-allocs the boolean result array:

def is_numeric(x):

    def try_float(xx):
        try:
            float(xx)
        except ValueError:
            return False
        else:
            return True

    return fromiter((try_float(xx) for xx in x.flat),
                    dtype=bool, count=x.size)

x = array(['1.2', '2.3', '1.2.3'])
print is_numeric(x)

Gives:

[ True  True False]

2 Comments

You could skip the loop and just vectorize your function. vec_try_float = np.vectorize(try_float)
But vectorize doesn't usually speed up this sort of loop.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.