Pandas for loop over dataframe gives too many values to unpack

Question

I don't see why this code isn't working? I am trying to iterate over a data frame, which in this case only has one row in a for loop? There are only two columns and I have two for loop variables to take them? what am I missing please?

  print("process_list =  ",process_list)

  for row in process_list.itertuples():
       print("row = ", row)


  df_to_date = pd.DataFrame()

  try:
        print("process_list = {}  and it's type {}  process_list.itertuples() {} ".format(process_list, type(process_list),process_list.itertuples() ) )

        for   file_date , file_name  in process_list.itertuples(): # a whole batch of days 
               file_to_process = dev_env + file_name
               print("PROCESSING BATCH: ",file_to_process)
               df  = pd.read_csv(file_to_process, header=None,skiprows=22, sep=',', comment='*', converters = {"Days" : just_number,"Percentile" : just_number,"Date" : just_number} ,names = column_names )
               df.insert(0,'File_date',file_date)
               df_to_date = df_to_date.append(df)

  except Exception as e: 
           print ("nothing to process exception = ",e)
           sys.exit(0)

when I run it I get

process_list =       File_date          File_name
94   20180507  mcmhv20180507.csv
row =  Pandas(Index=94, File_date=20180507, File_name='mcmhv20180507.csv')
process_list =     File_date          File_name
94   20180507  mcmhv20180507.csv  and it's type <class 'pandas.core.frame.DataFrame'>  process_list.itertuples() <map object at 0x7f6339371e48> 
nothing to process exception =  too many values to unpack (expected 2)

jpp · Accepted Answer · 2018-05-09 08:03:12Z

pd.DataFrame.itertuples returns an iterable of namedtuples including the index by default.

There are two options to account for this.

Option 1

Unpack 3 items instead of 2, the first of which you do not use.

Here is a minimal example:

df = pd.DataFrame([[10, 20], [30, 40], [50, 60]],
                  columns=['A', 'B'])

for idx, a, b in df.itertuples():
    print(idx, a, b)

0 10 20
1 30 40
2 50 60

In your case, a good convention to use would be to indicate an unused variable by _:

for _, file_date, file_name in process_list[['date', 'name']].itertuples():
    # do something

Option 2

Use index=False argument and unpack 2 elements:

for file_date, file_name in process_list[['date', 'name']].itertuples(index=False):
    # do something

The behaviour is indicated in the documentation:

DataFrame.itertuples(index=True, name='Pandas')

Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.

Collectives™ on Stack Overflow

Pandas for loop over dataframe gives too many values to unpack

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related