Skip to main content
added 19 characters in body
Source Link
Divakar
  • 222.1k
  • 19
  • 273
  • 374

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
IDs =   return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [234][162]: df
Out[234]Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [235][163]: query_cols
Out[235]:column_index(df, ['peach', 'banana', 'apple']

In [236]: IDs)
Out[236]Out[163]: array([4, 1, 0])

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

cols = df.columns.values
sidx = np.argsort(cols)
IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [234]: df
Out[234]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [235]: query_cols
Out[235]: ['peach', 'banana', 'apple']

In [236]: IDs
Out[236]: array([4, 1, 0])

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
    return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [162]: df
Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [163]: column_index(df, ['peach', 'banana', 'apple'])
Out[163]: array([4, 1, 0])
deleted 230 characters in body
Source Link
Divakar
  • 222.1k
  • 19
  • 273
  • 374

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

cols = df.columns.values.astype(str)
sidx = colsnp.argsort(cols)
IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [207][234]: df
Out[207]Out[234]: 
   apple  banana  pear  orange  peach
0      48       73     24       04      72
1      54       4     83       50      51
2      31       62     16       8      51

In [208][235]: query_cols
Out[208]Out[235]: ['peach', 'banana', 'apple']

In [209]: # Proposed solution
     ...: cols = df.columns.values.astype(str)
     ...: sidx = cols.argsort()
     ...: IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]
     ...: 

In [210][236]: IDs
Out[210]Out[236]: array([4, 1, 0])

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

cols = df.columns.values.astype(str)
sidx = cols.argsort()
IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [207]: df
Out[207]: 
   apple  banana  pear  orange  peach
0      4       7     2       0      7
1      5       4     8       5      5
2      3       6     1       8      5

In [208]: query_cols
Out[208]: ['peach', 'banana', 'apple']

In [209]: # Proposed solution
     ...: cols = df.columns.values.astype(str)
     ...: sidx = cols.argsort()
     ...: IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]
     ...: 

In [210]: IDs
Out[210]: array([4, 1, 0])

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

cols = df.columns.values
sidx = np.argsort(cols)
IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [234]: df
Out[234]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [235]: query_cols
Out[235]: ['peach', 'banana', 'apple']

In [236]: IDs
Out[236]: array([4, 1, 0])
Source Link
Divakar
  • 222.1k
  • 19
  • 273
  • 374

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

cols = df.columns.values.astype(str)
sidx = cols.argsort()
IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [207]: df
Out[207]: 
   apple  banana  pear  orange  peach
0      4       7     2       0      7
1      5       4     8       5      5
2      3       6     1       8      5

In [208]: query_cols
Out[208]: ['peach', 'banana', 'apple']

In [209]: # Proposed solution
     ...: cols = df.columns.values.astype(str)
     ...: sidx = cols.argsort()
     ...: IDs = sidx[np.searchsorted(cols,query_cols,sorter=sidx)]
     ...: 

In [210]: IDs
Out[210]: array([4, 1, 0])