Select rows (with multiple strings) in pandas dataframe that contain only a given string

Question

I'm manipulating the 2017 developer survey results. I want to isolate those rows which contain only the string Python in the HaveWorkedLanguage column.

This what that df['HaveWorkedLanguage'] column looks like:

0                                                 Swift
1                         JavaScript; Python; Ruby; SQL
2                                     Java; PHP; Python
3                                        Python; R; SQL
4                                                   NaN
5                                 JavaScript; PHP; Rust
6                                        Matlab; Python
7        CoffeeScript; Clojure; Elixir; Erlang; Haskell
8                                        C#; JavaScript
9                                    Objective-C; Swift
10                                               R; SQL
11                                                  NaN
12                                         C; C++; Java
13                          Java; JavaScript; Ruby; SQL
14                                     Assembly; C; C++
15                                   JavaScript; VB.NET
16                                           JavaScript
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
19                                         Perl; Python
20                                                  NaN
21                                  C#; JavaScript; SQL
22                                                 Java
23                                          Python; SQL
24                                                  NaN
25                                          Java; Scala
26         Java; JavaScript; Objective-C; Python; Swift
27                                                  NaN
28                                               Python
29                                                  NaN
...

I tried using pandas.Series.str.match which should:

Determine if each string matches a regular expression.

as shown here

import pandas as pd
df = pd.read_csv("survey_results_public.csv")
rows_w_Python = df[df['HaveWorkedLanguage'].str.match("Python", na=False)]['HaveWorkedLanguage']

The problem is that this selects those rows containing Python as a first entry, not those containing only Python, which resulsts in:

3                                        Python; R; SQL
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
23                                          Python; SQL
28                                               Python
...

How can I keep the rows that contain only Python?

You mean using r"Python" instead of Python in the above command? Tried it, no change. — Gabriel
– Gabriel, Commented Jun 18, 2017 at 22:05
Does "only Python" require regex? df['HaveWorkedLanguage'] == 'Python' should create a boolean filter for that. — user2285236
– user2285236, Commented Jun 18, 2017 at 22:07
@ayhan I think you nailed it, it was simpler than I thought. Could you turn your comment into an answer so I can accept it? — Gabriel
– Gabriel, Commented Jun 18, 2017 at 22:11

user2285236 · Accepted Answer · 2017-06-18 22:16:38Z

2

For exact matching, == operator should suffice. It doesn't require regex.

df['HaveWorkedLanguage'] == 'Python' returns a boolean filter where the value is exactly 'Python'.

Passing this filter to the DataFrame yields:

df[df['HaveWorkedLanguage'] == 'Python']
Out: 
   HaveWorkedLanguage
18             Python
28             Python

answered Jun 18, 2017 at 22:16

user2285236

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Select rows (with multiple strings) in pandas dataframe that contain only a given string

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related