4

I am really new to dask. I want to create a dask dataframe from a python list of tuples. In pandas, you can use DataFrame.from_records to convert a list of tuples to a dataframe. What function can give me same functionality in dask. My data looks a bit like this

[(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')]

I am using this code to perform the task right now. Is this correct way of doing this.

import pandas as pd
import dask
import dask.dataframe as dd

names = ['id', 'status', 'reg_entry']
dfs = dask.delayed(pd.DataFrame.from_records)(cursor.fetchall(), columns=names)

df = dd.from_delayed(dfs)
2
  • 2
    Welcome to SO. Please read How to ask a good question. Can you provide code samples what you did already? Commented Oct 16, 2018 at 7:38
  • @Florian sorry for not being clearer the first time. I am new to this forum and in learning phase. Thanks for correcting me. Commented Oct 16, 2018 at 7:45

1 Answer 1

4

You can try creating a dask dataframe from an existing pandas dataframe (to be able to use all pandas constructors):

df = pd.DataFrame([(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')])
ddf = dd.from_pandas(df, npartitions=2)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @Tina . So there is no direct function available in dask to ingest such data and one has to pass through the pandas?
apparently, yes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.