0

I recently decided to be more adventurous and try to explore more DASK dataframes. I am trying to apply a specific function to one of the column dataframe, the syntax that I am using is the following:

import pandas as pd
import dask.dataframe as dd
import dask.array as da

df_data = pd.DataFrame({'Column 1': [300,300,450,500,500,750,600,300, 150],'Column 2': [100,130,230,200,300,350,600,550,530], 'Column 3': [250, 300, 400, 500, 700,350, 750, 550, 600]})

def TestFunc(x):
    y = x*2 + abs(x/2 - x*3)
    return y

dd_data = dd.from_pandas(df_data, npartitions = 1)
data_test = dd.map_partitions(TestFunc,dd_data['Column 1'])
data_test.compute()

Naturally is a simpler example that I just made up to show how what I have been doing. This code is working well, the problem is on the real situation that I am facing. Now, I have a more complex dataframe where I want to apply a function to one column. I am applying the following function:

 def GetID(phase):
     nDataPoints = len(phase)
     myRanges = np.deg2rad(np.arange(0,360,6))
     phase[phase>np.deg2rad(354+3)] = 0
     ID = np.array([])
     for i in np.arange(0,nDataPoints):
         val = abs(myRanges-phase[i])
         iID = np.argmin(val)
         ID = np.append(ID, iID+1)
     return ID

I am able to apply the function to the column with .map_partitions, the problem is that when I try to use after .compute() to see the numerical results I receive an error Key error: 0. I don't understand how I would have no problem with my previous simpler example and with the situation that I am facing.

Hope that I managed to be succinct and precise. I would really appreciate your help on this one! Suggestions of what to look up for are also welcome

1 Answer 1

1

I recommend trying your function on a normal Pandas dataframe to verify that it is working correctly:

GetID(df.compute())

If that works then I would next try using the single threaded scheduler, along with the pdb module to investigate the traceback

df.map_partitions(GetID).compute(scheduler='single-threaded')

This is easy to do if you are in IPython with the %debug magic.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.