0

How can one read/write pandas DataFrames (Numpy arrays) of strings in Cython?

It works just fine when I work with integers or floats:

# Cython file numpy_.pyx
@boundscheck(False)
@wraparound(False)
cpdef fill(np.int64_t[:,::1] arr):
    arr[0,0] = 10
    arr[0,1] = 11
    arr[1,0] = 13
    arr[1,1] = 14
# Python code
import numpy as np
from numpy_ import fill
a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
print(a)
fill(a)
print(a)

gives

>>> a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
>>> print(a)
[[0 1 2]
 [3 4 5]]
>>> fill(a)
>>> print(a)
[[10 11  2]
 [13 14  5]]

Also, the following code

# Python code
import numpy as np, pandas as pd
from numpy_ import fill
a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
df = pd.DataFrame(a)
print(df)
fill(df.values)
print(df)

gives

>>> a = np.array([[0,1,2],[3,4,5]], dtype=np.int64)
>>> df = pd.DataFrame(a)
>>> print(df)
   0  1  2
0  0  1  2
1  3  4  5
>>> fill(df.values)
>>> print(df)
    0   1  2
0  10  11  2
1  13  14  5

However, I am having hard time figuring out how to do the same thing when the input is an array of strings. For example, how can I read of modify a Numpy array or a pandas DataFrame:

a2 = np.array([['000','111','222'],['333','444','555']], dtype='U3')
df2 = pd.DataFrame(a2)

and, let us say, the goal is to change through Cython

'000' -> 'AAA'; '111' -> 'BBB'; '222' -> 'CCC'; '333' -> 'DDD'

I did read the following NumPy documentation page and the following Cython documentation page, but still can not figure out what to do.

Thank you very much for your help!

7
  • pandas does not use the numpy string dtypes. It makes those series object dtype. Look at df2.dtypes. Commented Aug 5, 2019 at 17:36
  • @hpaulj So, the declaration of a corresponding function should be cpdef fill_str(np.object_t[:,::1] arr)? Why does type(df2.at[0,0]) then give <class 'str'> (i.e. not 'object')? Commented Aug 5, 2019 at 17:42
  • str is an object. A dataframe designed to hold object can hold any subclass of object including str Commented Aug 5, 2019 at 17:53
  • @DavidW Thank you! If you know what I should read to understand what I need to do to solve my problem, please, let me know! Commented Aug 5, 2019 at 18:03
  • 2
    Here's a couple of (maybe) useful links for Numpy arrays of strings stackoverflow.com/questions/42543485/… stackoverflow.com/questions/28774096/…. This doesn't necessarily help you with Pandas too much, except that you can force Pandas to have a fixed length string datatype by specifying it in dtype. It also doesn't help with Unicode. I don't really have much advice beyond what's in this comment... Commented Aug 5, 2019 at 19:59

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.