I'm manipulating the 2017 developer survey results. I want to isolate those rows which contain only the string Python in the HaveWorkedLanguage column.
This what that df['HaveWorkedLanguage'] column looks like:
0 Swift
1 JavaScript; Python; Ruby; SQL
2 Java; PHP; Python
3 Python; R; SQL
4 NaN
5 JavaScript; PHP; Rust
6 Matlab; Python
7 CoffeeScript; Clojure; Elixir; Erlang; Haskell
8 C#; JavaScript
9 Objective-C; Swift
10 R; SQL
11 NaN
12 C; C++; Java
13 Java; JavaScript; Ruby; SQL
14 Assembly; C; C++
15 JavaScript; VB.NET
16 JavaScript
17 Python; Matlab; Rust; SQL; Swift
18 Python
19 Perl; Python
20 NaN
21 C#; JavaScript; SQL
22 Java
23 Python; SQL
24 NaN
25 Java; Scala
26 Java; JavaScript; Objective-C; Python; Swift
27 NaN
28 Python
29 NaN
...
I tried using pandas.Series.str.match which should:
Determine if each string matches a regular expression.
as shown here
import pandas as pd
df = pd.read_csv("survey_results_public.csv")
rows_w_Python = df[df['HaveWorkedLanguage'].str.match("Python", na=False)]['HaveWorkedLanguage']
The problem is that this selects those rows containing Python as a first entry, not those containing only Python, which resulsts in:
3 Python; R; SQL
17 Python; Matlab; Rust; SQL; Swift
18 Python
23 Python; SQL
28 Python
...
How can I keep the rows that contain only Python?
r'Python'as a regex instead of string 'Python'r"Python"instead ofPythonin the above command? Tried it, no change.df['HaveWorkedLanguage'] == 'Python'should create a boolean filter for that.