Curious how this works under the hood. Did numpy just build functionality to handle pandas objects or is there something else going on here?
data = pandas.Series([1,2,3,4])
numpy.sqrt(data) # returns pandas.Series
In addition to overriding __array__ as the other answer mentions:
pd.Series: it implements __array_ufunc__ so it overrides the ufunc behaviour with that, including how the output should look like.
pd.DataFrame: it doesn't implement that method, but implements __array_wrap__; and this gives the control for how output should look like.
See here for the output type determination. Pandas docs also mentions the series case.
The dataframe (and Series) has an __array__ method:
In [138]: df
Out[138]:
Account1 Account2 m_solution
0 150 18 -117.857143
1 130 1200 104.586466
2 150 18 -117.857143
3 106 1200 88.793262
4 150 18 -117.857143
5 170 1200 127.810219
6 150 138 -6.250000
7 1056 1200 67.404255
In [139]: df.__array__()
Out[139]:
array([[ 150. , 18. , -117.85714286],
[ 130. , 1200. , 104.58646617],
[ 150. , 18. , -117.85714286],
[ 106. , 1200. , 88.79326187],
[ 150. , 18. , -117.85714286],
[ 170. , 1200. , 127.81021898],
[ 150. , 138. , -6.25 ],
[1056. , 1200. , 67.40425532]])
Equivalently you can get the array with:
In [140]: df.values
Out[140]:
array([[ 150. , 18. , -117.85714286],
[ 130. , 1200. , 104.58646617],
[ 150. , 18. , -117.85714286],
[ 106. , 1200. , 88.79326187],
[ 150. , 18. , -117.85714286],
[ 170. , 1200. , 127.81021898],
[ 150. , 138. , -6.25 ],
[1056. , 1200. , 67.40425532]])
In [141]: df.to_numpy()
Out[141]:
array([[ 150. , 18. , -117.85714286],
[ 130. , 1200. , 104.58646617],
[ 150. , 18. , -117.85714286],
[ 106. , 1200. , 88.79326187],
[ 150. , 18. , -117.85714286],
[ 170. , 1200. , 127.81021898],
[ 150. , 138. , -6.25 ],
[1056. , 1200. , 67.40425532]])
I think pandas docs encourage the use of to_numpy.
The data of the frame is stored in one or more arrays (depending on dtypes). Whether the array you get these ways is actually that array, a view or copy may vary.
Code for __array__
Signature: df.__array__(dtype=None) -> 'np.ndarray'
Docstring: <no docstring>
Source:
def __array__(self, dtype=None) -> np.ndarray:
return np.asarray(self._values, dtype=dtype)
See also Series.__array__. It's a bit different.
and Series.__array_wrap__:
S.__array_wrap__(
result: 'np.ndarray',
context: 'Optional[Tuple[Callable, Tuple[Any, ...], int]]' = None,
)
Docstring:
Gets called after a ufunc and other functions.
Parameters
----------
result: np.ndarray
The result of the ufunc or other function called on the NumPy array
returned by __array__
Series, I see __array_wrap__. A Series isn't a subclass of ndarray, but it appears to have many of the methods that make it behave as one.