Assigning a column value based on multiple column conditions in python

Question

I've got a pandas dataframe that is structured as such,

ID  Col1  Col2
1   50    12:23:01
1   34    12:25:11
1   65    12:32:25
1   98    12:45:08
2   23    11:09:10
2   12    11:12:43
2   56    11:13:12
2   34    11:14:26
2   77    11:16:02
3   64    14:01:11
3   34    14:01:13
3   48    14:02:32

What I need is to be able to search within a repeating ID value to find a condition in column 1, say Col1==34. Based on this, I need to create a new column, Col3, that takes on the corresponding value in Col2. The end result I need is shown below.

ID  Col1  Col2      Col3
1   50    12:23:01  12:25:11
1   34    12:25:11  12:25:11
1   65    12:32:25  12:25:11
1   98    12:45:08  12:25:11
2   23    11:09:10  11:14:26
2   12    11:12:43  11:14:26
2   56    11:13:12  11:14:26
2   34    11:14:26  11:14:26
2   77    11:16:02  11:14:26
3   64    14:01:11  14:01:13
3   34    14:01:13  14:01:13
3   48    14:02:32  14:01:13

I've tried the following, but it's not pulling the distinct Col2 value, rather it's just duplicating Col2

df['Col3'] = np.where(df.Col1.isin(df[df.Col2==34].Col1), df['Col2'], 0)

I realize that assigning the df['Col2'] else 0 from the where condition is most likely my logic issue, and that there is probably some easy concise way of doing this (or that my time might be better spent in SQL), but I'm not sure on how to set this up. Thanks in advance.

piRSquared · Accepted Answer · 2016-12-23 18:39:37Z

3

using query + map

df['Col3'] = df.ID.map(df.query('Col1 == 34').set_index('ID').Col2)

print(df)

    ID  Col1      Col2      Col3
0    1    50  12:23:01  12:25:11
1    1    34  12:25:11  12:25:11
2    1    65  12:32:25  12:25:11
3    1    98  12:45:08  12:25:11
4    2    23  11:09:10  11:14:26
5    2    12  11:12:43  11:14:26
6    2    56  11:13:12  11:14:26
7    2    34  11:14:26  11:14:26
8    2    77  11:16:02  11:14:26
9    3    64  14:01:11  14:01:13
10   3    34  14:01:13  14:01:13
11   3    48  14:02:32  14:01:13

dealing with duplicates

# keep first instance
df.ID.map(df.query('Col1 == 34') \
    .drop_duplicates(subset=['ID']).set_index('ID').Col2)

Or

# keep last instance
df.ID.map(df.query('Col1 == 34') \
    .drop_duplicates(subset=['ID'], keep='last').set_index('ID').Col2)

edited Dec 23, 2016 at 18:39

answered Dec 23, 2016 at 18:27

piRSquared

295k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

rastrast Over a year ago

Appreciate it! Worked perfectly.

rastrast Over a year ago

Actually, this works on a test dataset, but in my actual dataset it appears that I've got duplicated values based on the error: Reindexing only valid with uniquely valued Index objects. I am assuming I'll have to drop duplicates first?

piRSquared Over a year ago

That will work. But, do you want to take first observation?

rastrast Over a year ago

Any of the observations would work, so taking the first would be fine. In my real situation it's as if (sometimes) row 1 (Col1==34 and Col2==12:25:11) is duplicated some number of times, or may not be duplicated at all. I didn't realize I had these duplication's until now (I've got a rather large dataset)

rastrast Over a year ago

Thanks for the edit on dealing with duplicates - was extremely helpful.

Ted Petrou · Accepted Answer · 2016-12-23 18:28:45Z

3

Take advantage of pandas automatic index alignment by making id the index. Then just append a column based on boolean selection. This answer assumes col1 is unique.

df.set_index('id', inplace=True)
df['col3'] = df.loc[df.col1 == 34, 'col2']

answered Dec 23, 2016 at 18:28

Ted Petrou

62.4k19 gold badges139 silver badges139 bronze badges

Comments

Divakar · Accepted Answer · 2016-12-23 18:36:32Z

2

Here's a NumPy based vectorized solution -

df['Col3'] = df.Col2.values[df.Col1.values == 34][df.ID.factorize()[0]]

edited Dec 23, 2016 at 18:36

answered Dec 23, 2016 at 18:33

Divakar

222k19 gold badges273 silver badges374 bronze badges

Collectives™ on Stack Overflow

Assigning a column value based on multiple column conditions in python

3 Answers 3

5 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Related