0

Is there a way that I can create a new column in a dataframe by selecting values from different columns from another dataframe based on some conditions in the first dataframe?

My data sets are like this:

df1 = pd.DataFrame(
    [['USA', 1992],
    ['China', 1993],
    ['Japan', 1994]],
    columns = ['Country', 'year'])
scores = pd.DataFrame(
    [['USA', 20, 30, 40],
    ['China', 5, 15, 30],
    ['Japan', 30, 50, 40],
    ['Korea', 10, 15, 20],
    ['France', 10, 12, 15]],
    columns = ['Country', 1992, 1993, 1994])

And my desired dataset would be:

df = pd.DataFrame(
    [['USA', 1992, 20]
    ['China', 1993, 15]
    ['Japan', 1994, 40]],
    columns = ['Country', 'year', 'score'])

I have tried using apply with a lambda function but it gives me a

KeyError: ('Country', u'occurred at index Country')

the line that I have tried is:

df1['score'] = df.apply(lambda x: scores[scores['Country'] == x['Country']][x['year']][1])

Thank you in advance!

3
  • What are the conditions? How do you want to select them? We cannot figure that out from your code since your code isn't working. Commented Dec 4, 2016 at 23:57
  • @ayhan I think the previous edit was missing the scores dataframe hence causing the confusion. I was trying to add a new score column to the df1 based on the year columns of df1. Thanks Commented Dec 5, 2016 at 0:08
  • Ah yes, sorry about that. Commented Dec 5, 2016 at 0:13

2 Answers 2

1

You can melt the scores DataFrame and merge it with the original:

scores = pd.melt(scores, id_vars='Country', value_name='score', var_name='year')
df1.merge(scores)
Out: 
  Country  year  score
0     USA  1992     20
1   China  1993     15
2   Japan  1994     40

merge by default merges on common columns. If you want to specify the column names, you can use the on parameter (i.e. df1.merge(scores, on=['Country', 'year']))

Sign up to request clarification or add additional context in comments.

1 Comment

Have not known the melt function before, thanks a lot!
0

You can use Country as an index on scores DataFrame:

scores = scores.set_index(['Country'])

Then you will be able to apply the function get_score, creating and filling the score column with the desired value:

def get_score(row):
    row['score'] = scores.loc[row['Country'], row['year']]
    return row

df = df1.apply(get_score, axis=1)

Which gives you this output:

  Country  year  score
0     USA  1992     20
1   China  1993     15
2   Japan  1994     40

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.