2

Curious how this works under the hood. Did numpy just build functionality to handle pandas objects or is there something else going on here?

data = pandas.Series([1,2,3,4])
numpy.sqrt(data) # returns pandas.Series

2 Answers 2

2

In addition to overriding __array__ as the other answer mentions:

pd.Series: it implements __array_ufunc__ so it overrides the ufunc behaviour with that, including how the output should look like.

pd.DataFrame: it doesn't implement that method, but implements __array_wrap__; and this gives the control for how output should look like.

See here for the output type determination. Pandas docs also mentions the series case.

Sign up to request clarification or add additional context in comments.

Comments

1

The dataframe (and Series) has an __array__ method:

In [138]: df
Out[138]: 
   Account1  Account2  m_solution
0       150        18 -117.857143
1       130      1200  104.586466
2       150        18 -117.857143
3       106      1200   88.793262
4       150        18 -117.857143
5       170      1200  127.810219
6       150       138   -6.250000
7      1056      1200   67.404255
In [139]: df.__array__()
Out[139]: 
array([[ 150.        ,   18.        , -117.85714286],
       [ 130.        , 1200.        ,  104.58646617],
       [ 150.        ,   18.        , -117.85714286],
       [ 106.        , 1200.        ,   88.79326187],
       [ 150.        ,   18.        , -117.85714286],
       [ 170.        , 1200.        ,  127.81021898],
       [ 150.        ,  138.        ,   -6.25      ],
       [1056.        , 1200.        ,   67.40425532]])

Equivalently you can get the array with:

In [140]: df.values
Out[140]: 
array([[ 150.        ,   18.        , -117.85714286],
       [ 130.        , 1200.        ,  104.58646617],
       [ 150.        ,   18.        , -117.85714286],
       [ 106.        , 1200.        ,   88.79326187],
       [ 150.        ,   18.        , -117.85714286],
       [ 170.        , 1200.        ,  127.81021898],
       [ 150.        ,  138.        ,   -6.25      ],
       [1056.        , 1200.        ,   67.40425532]])
In [141]: df.to_numpy()
Out[141]: 
array([[ 150.        ,   18.        , -117.85714286],
       [ 130.        , 1200.        ,  104.58646617],
       [ 150.        ,   18.        , -117.85714286],
       [ 106.        , 1200.        ,   88.79326187],
       [ 150.        ,   18.        , -117.85714286],
       [ 170.        , 1200.        ,  127.81021898],
       [ 150.        ,  138.        ,   -6.25      ],
       [1056.        , 1200.        ,   67.40425532]])

I think pandas docs encourage the use of to_numpy.

The data of the frame is stored in one or more arrays (depending on dtypes). Whether the array you get these ways is actually that array, a view or copy may vary.

Code for __array__

Signature: df.__array__(dtype=None) -> 'np.ndarray'
Docstring: <no docstring>
Source:   
    def __array__(self, dtype=None) -> np.ndarray:
        return np.asarray(self._values, dtype=dtype)

See also Series.__array__. It's a bit different.

and Series.__array_wrap__:

S.__array_wrap__(
    result: 'np.ndarray',
    context: 'Optional[Tuple[Callable, Tuple[Any, ...], int]]' = None,
)
Docstring:
Gets called after a ufunc and other functions.

Parameters
----------
result: np.ndarray
    The result of the ufunc or other function called on the NumPy array
    returned by __array__

2 Comments

Thanks, seems like numpy functions operate on the array method but how does it preserve the return type? E.g. returning pandas.Series instead of numpy.array?
Looking at the methods of a Series, I see __array_wrap__. A Series isn't a subclass of ndarray, but it appears to have many of the methods that make it behave as one.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.