1

I am trying to convert survey data on the marital status which look as follows:

df['d11104'].value_counts()

[1] Married        1    250507
[2] Single         2     99131
[4] Divorced       4     32817
[3] Widowed        3     24839
[5] Separated      5      8098
[-1] keine Angabe         2571
Name: d11104, dtype: int64

So far, I did df['marstat'] = df['d11104'].cat.codes.astype('category'), yielding

df['marstat'].value_counts()
1    250507
2     99131
4     32817
3     24839
5      8098
0      2571
Name: marstat, dtype: int64

Now, I'd like to add labels to the columnmarstat, such that the numerical values are maintained, i.e. I like to identify people by the condition df['marstat'] == 1, while at the same time being having labels ['Married','Single','Divorced','Widowed'] attached to this variable. How can this be done?

EDIT: Thanks to jpp's Answer, i simply created a new variable and defined the labels by hand:

df['marstat_lb'] = df['marstat'].map({1: 'Married', 2: 'Single', 3: 'Widowed', 4: 'Divorced', 5: 'Separated'})

1 Answer 1

2

You can convert your result to a dataframe and include both the category code and name in the output.

A dictionary of category mapping can be extracted via enumerating the categories. Minimal example below.

import pandas as pd

df = pd.DataFrame({'A': ['M', 'M', 'S', 'D', 'W', 'M', 'M', 'S',
                         'S', 'S', 'M', 'W']}, dtype='category')

print(df.A.cat.categories)

# Index(['D', 'M', 'S', 'W'], dtype='object')

res = df.A.cat.codes.value_counts().to_frame('count')

cat_map = dict(enumerate(df.A.cat.categories))

res['A'] = res.index.map(cat_map.get)

print(res)

#    count  A
# 1      5  M
# 2      4  S
# 3      2  W
# 0      1  D

For example, you can access "M" by either df['A'] == 'M' or df.index == 1.


A more straightforward solution is just to use apply value_counts and then add an extra column for codes:

res = df.A.value_counts().to_frame('count').reset_index()

res['code'] = res['index'].cat.codes

  index  count  code
0     M      5     1
1     S      4     2
2     W      2     3
3     D      1     0
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. Maybe I wasn't specific enough. I don't want to address items in the frequency table, but in the individual data. Slightly altering your approach, I tried: df['marstat']=df['d11104'].cat.codes and labels = dict(enumerate(df['d11104'].cat.categories)) . However, df['marstat_lb'] = df['marstat'].index.map(labels.get) gives me None for every value of df['marstat'] Is there no way to set a set of labels (maps (?) in python terminology) 'along' the categorical data? Coming from Stata, it is pretty common there.
Wouldn't you need to do df['marstat_lb'] = df['marstat'].map(labels) in your example? Then you shouldn't get None ..

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.