34

I created a dataframe using the following:

df = pd.DataFrame(np.random.rand(10, 3), columns=['alp1', 'alp2', 'bet1'])

I'd like to get a dataframe containing every columns from df that have alp in their names. This is only a light version of my problem, so my real dataframe will have more columns.

4 Answers 4

59

alternative methods:

In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
       alp1      alp2
0  0.357564  0.108907
1  0.341087  0.198098
2  0.416215  0.644166
3  0.814056  0.121044
4  0.382681  0.110829
5  0.130343  0.219829
6  0.110049  0.681618
7  0.949599  0.089632
8  0.047945  0.855116
9  0.561441  0.291182

In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
       alp1      alp2
0  0.357564  0.108907
1  0.341087  0.198098
2  0.416215  0.644166
3  0.814056  0.121044
4  0.382681  0.110829
5  0.130343  0.219829
6  0.110049  0.681618
7  0.949599  0.089632
8  0.047945  0.855116
9  0.561441  0.291182
Sign up to request clarification or add additional context in comments.

Comments

16

In case @Pedro answer doesn't work here is official way of doing it for pandas 0.25

Sample dataframe:

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
         one two three
mouse     1   2   3
rabbit    4   5   6

Select columns by name

df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6

Select columns by regular expression

df.filter(regex='e$', axis=1) #ending with *e*, for checking containing just use it without *$* in the end
         one  three
mouse     1      3
rabbit    4      6

Select rows containing 'bbi'

df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6

Comments

7

You've several options, here's a couple:

1 - filter with like:

df.filter(like='alp')

2 - filter with regex:

df.filter(regex='alp')

Comments

3

option 1
Full numpy + pd.DataFrame

m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
pd.DataFrame(df.values[:, m], df.index, df.columns[m])

       alp1      alp2
0  0.819189  0.356867
1  0.900406  0.968947
2  0.201382  0.658768
3  0.700727  0.946509
4  0.176423  0.290426
5  0.132773  0.378251
6  0.749374  0.983251
7  0.768689  0.415869
8  0.292140  0.457596
9  0.214937  0.976780

option 2
numpy + loc

m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
df.loc[:, m]

       alp1      alp2
0  0.819189  0.356867
1  0.900406  0.968947
2  0.201382  0.658768
3  0.700727  0.946509
4  0.176423  0.290426
5  0.132773  0.378251
6  0.749374  0.983251
7  0.768689  0.415869
8  0.292140  0.457596
9  0.214937  0.976780

timing
numpy is faster

enter image description here

1 Comment

That np.core.defchararray needs more participation in pandas.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.