4
\$\begingroup\$

I have a dictionary where for each key, a single value is stored. Say

import pandas as pd
dd = {'Alice': 40,
      'Bob': 50,
      'Charlie': 35}

Now, I want to cast this dictionary to a pd.Dataframe with two columns. The first column contains the keys of the dictionary, the second column the values and give the columns a name (Say "Name" and "Age"). I expect to have a function call like:

 pd.DataFrame(dd, columns=['Name', 'Age'])  

which gives not desired output, since it only has 0 rows.

Currently I have two "solutions":

# Rename the index and reset it:
pd.DataFrame.from_dict(dd, orient='index', columns=['Age']).rename_axis('Name').reset_index()
pd.DataFrame(list(dd.items()), columns=['Name', 'Age'])

# Both result in the desired output:
    Name    Age
0   Alice   40
1   Bob     50
2   Charlie 35

However, both appear a bit hacky and thus inefficient and error-prone to me. Is there a more pythonic way to achieve this?

\$\endgroup\$
3
  • 3
    \$\begingroup\$ There's nothing wrong/hacky in using pd.DataFrame(dd.items(), columns=['Name', 'Age']) to get the needed result in your case \$\endgroup\$ Commented Jan 31, 2020 at 15:13
  • \$\begingroup\$ @RomanPerekhrest, Didn't realize that ```list()´´´´ can be removed. Without this, it seems to be ok for me. Do you want to post it as an answer, so I can accept it? \$\endgroup\$ Commented Jan 31, 2020 at 15:31
  • 2
    \$\begingroup\$ Honestly, it's too simple to be a significant answer. \$\endgroup\$ Commented Jan 31, 2020 at 15:32

1 Answer 1

3
\$\begingroup\$

The advantage of your call to from_dict is that the method name makes the conversion a little obvious (though the rest of the index manipulation makes this less obvious). Don't rename_axis(); instead pass a names parameter in reset_index().

Your call to dd.items() is probably the best approach in terms of simplicity, just drop the call to list.

I show two other options: one makes it even more obvious what's going on by sending in separate key and value series; and the fourth is a variant of your I expect to have a function call like but repaired.

import typing
import pandas as pd

def method_a(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
    return pd.DataFrame.from_dict(
        data=dd, orient='index', columns=columns[1:],
    ).reset_index(names=columns[0])


def method_b(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
    return pd.DataFrame(data=dd.items(), columns=columns)


def method_c(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
    kcol, vcol = columns
    return pd.DataFrame({kcol: dd.keys(), vcol: dd.values()})


def method_d(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
    df = pd.DataFrame(dd, index=columns[1:])
    return df.T.reset_index(names=columns[0])


def test() -> None:
    dd = {'Alice': 40,
          'Bob': 50,
          'Charlie': 35}
    ref = method_a(dd=dd, columns=('Name', 'Age'))
    for method in (method_b, method_c, method_d):
        result = method(dd=dd, columns=('Name', 'Age'))
        assert ref.equals(result)


if __name__ == '__main__':
    test()
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.