1

I'm manipulating the 2017 developer survey results. I want to isolate those rows which contain only the string Python in the HaveWorkedLanguage column.

This what that df['HaveWorkedLanguage'] column looks like:

0                                                 Swift
1                         JavaScript; Python; Ruby; SQL
2                                     Java; PHP; Python
3                                        Python; R; SQL
4                                                   NaN
5                                 JavaScript; PHP; Rust
6                                        Matlab; Python
7        CoffeeScript; Clojure; Elixir; Erlang; Haskell
8                                        C#; JavaScript
9                                    Objective-C; Swift
10                                               R; SQL
11                                                  NaN
12                                         C; C++; Java
13                          Java; JavaScript; Ruby; SQL
14                                     Assembly; C; C++
15                                   JavaScript; VB.NET
16                                           JavaScript
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
19                                         Perl; Python
20                                                  NaN
21                                  C#; JavaScript; SQL
22                                                 Java
23                                          Python; SQL
24                                                  NaN
25                                          Java; Scala
26         Java; JavaScript; Objective-C; Python; Swift
27                                                  NaN
28                                               Python
29                                                  NaN
...

I tried using pandas.Series.str.match which should:

Determine if each string matches a regular expression.

as shown here

import pandas as pd
df = pd.read_csv("survey_results_public.csv")
rows_w_Python = df[df['HaveWorkedLanguage'].str.match("Python", na=False)]['HaveWorkedLanguage']

The problem is that this selects those rows containing Python as a first entry, not those containing only Python, which resulsts in:

3                                        Python; R; SQL
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
23                                          Python; SQL
28                                               Python
...

How can I keep the rows that contain only Python?

4
  • maybe r'Python' as a regex instead of string 'Python' Commented Jun 18, 2017 at 22:02
  • You mean using r"Python" instead of Python in the above command? Tried it, no change. Commented Jun 18, 2017 at 22:05
  • 2
    Does "only Python" require regex? df['HaveWorkedLanguage'] == 'Python' should create a boolean filter for that. Commented Jun 18, 2017 at 22:07
  • 1
    @ayhan I think you nailed it, it was simpler than I thought. Could you turn your comment into an answer so I can accept it? Commented Jun 18, 2017 at 22:11

1 Answer 1

2

For exact matching, == operator should suffice. It doesn't require regex.

df['HaveWorkedLanguage'] == 'Python' returns a boolean filter where the value is exactly 'Python'.

Passing this filter to the DataFrame yields:

df[df['HaveWorkedLanguage'] == 'Python']
Out: 
   HaveWorkedLanguage
18             Python
28             Python
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.