I recently decided to be more adventurous and try to explore more DASK dataframes. I am trying to apply a specific function to one of the column dataframe, the syntax that I am using is the following:
import pandas as pd
import dask.dataframe as dd
import dask.array as da
df_data = pd.DataFrame({'Column 1': [300,300,450,500,500,750,600,300, 150],'Column 2': [100,130,230,200,300,350,600,550,530], 'Column 3': [250, 300, 400, 500, 700,350, 750, 550, 600]})
def TestFunc(x):
y = x*2 + abs(x/2 - x*3)
return y
dd_data = dd.from_pandas(df_data, npartitions = 1)
data_test = dd.map_partitions(TestFunc,dd_data['Column 1'])
data_test.compute()
Naturally is a simpler example that I just made up to show how what I have been doing. This code is working well, the problem is on the real situation that I am facing. Now, I have a more complex dataframe where I want to apply a function to one column. I am applying the following function:
def GetID(phase):
nDataPoints = len(phase)
myRanges = np.deg2rad(np.arange(0,360,6))
phase[phase>np.deg2rad(354+3)] = 0
ID = np.array([])
for i in np.arange(0,nDataPoints):
val = abs(myRanges-phase[i])
iID = np.argmin(val)
ID = np.append(ID, iID+1)
return ID
I am able to apply the function to the column with .map_partitions, the problem is that when I try to use after .compute() to see the numerical results I receive an error Key error: 0. I don't understand how I would have no problem with my previous simpler example and with the situation that I am facing.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one! Suggestions of what to look up for are also welcome