5

I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried:

colnames = big_df.columns.tolist()

# make index "length" the second column in the big_df
colnames = colnames[0] + colnames[4] + colnames[:-1] 

big_df = big_df[colnames]

I see the following error:

TypeError: must be str, not list

I'm not sure how to interpret this error because it actually should be a list, right?

Also, is there a general method to move any column by label to a specified position? My columns only have one level, i.e. no MultiIndex involved.

2
  • More generally, what is best practice for moving columns to organize a dataframe? Commented Oct 2, 2018 at 21:44
  • I would strongly recommend you use df.columns = colnames instead of df[[colnames]] as __getitem__ triggers a copy operation. But this is the trivial bit. I haven't seen any good NumPy-based solution for the reordering aspect. Commented Oct 2, 2018 at 22:42

1 Answer 1

9

Correcting your error

I'm not sure how to interpret this error because it actually should be a list, right?

No: colnames[0] and colnames[4] are scalars, not lists. You can't concatenate a scalar with a list. To make them lists, use square brackets:

colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]

You can either use df[[colnames]] or df.reindex(columns=colnames): both necessarily trigger a copy operation as this transformation cannot be processed in place.

Generic solution

But converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index objects are stored as NumPy arrays.

The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.

import pandas as pd, numpy as np
from string import ascii_uppercase

df = pd.DataFrame(columns=list(ascii_uppercase))

def shifter(df, col_to_shift, pos_to_move):
    arr = df.columns.values
    idx = df.columns.get_loc(col_to_shift)
    if idx == pos_to_move:
        pass
    elif idx > pos_to_move:
        arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]
    else:
        arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]
    arr[pos_to_move] = col_to_shift
    df = df.reindex(columns=arr)
    return df
    
df = df.pipe(shifter, 'J', 1)

print(df.columns)

Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',
       'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],
      dtype='object')

Performance benchmarking

Using NumPy slicing is more efficient with a large number of columns versus a list-based method:

n = 10000
df = pd.DataFrame(columns=list(range(n)))

def shifter2(df, col_to_shift, pos_to_move):
    cols = df.columns.tolist()
    cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))
    df = df.reindex(columns=cols)
    return df

%timeit df.pipe(shifter, 590, 5)   # 381 µs
%timeit df.pipe(shifter2, 590, 5)  # 1.92 ms
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.