3

I don't see why this code isn't working? I am trying to iterate over a data frame, which in this case only has one row in a for loop? There are only two columns and I have two for loop variables to take them? what am I missing please?

  print("process_list =  ",process_list)

  for row in process_list.itertuples():
       print("row = ", row)


  df_to_date = pd.DataFrame()

  try:
        print("process_list = {}  and it's type {}  process_list.itertuples() {} ".format(process_list, type(process_list),process_list.itertuples() ) )

        for   file_date , file_name  in process_list.itertuples(): # a whole batch of days 
               file_to_process = dev_env + file_name
               print("PROCESSING BATCH: ",file_to_process)
               df  = pd.read_csv(file_to_process, header=None,skiprows=22, sep=',', comment='*', converters = {"Days" : just_number,"Percentile" : just_number,"Date" : just_number} ,names = column_names )
               df.insert(0,'File_date',file_date)
               df_to_date = df_to_date.append(df)

  except Exception as e: 
           print ("nothing to process exception = ",e)
           sys.exit(0)

when I run it I get

process_list =       File_date          File_name
94   20180507  mcmhv20180507.csv
row =  Pandas(Index=94, File_date=20180507, File_name='mcmhv20180507.csv')
process_list =     File_date          File_name
94   20180507  mcmhv20180507.csv  and it's type <class 'pandas.core.frame.DataFrame'>  process_list.itertuples() <map object at 0x7f6339371e48> 
nothing to process exception =  too many values to unpack (expected 2)

1 Answer 1

5

pd.DataFrame.itertuples returns an iterable of namedtuples including the index by default.

There are two options to account for this.

Option 1

Unpack 3 items instead of 2, the first of which you do not use.

Here is a minimal example:

df = pd.DataFrame([[10, 20], [30, 40], [50, 60]],
                  columns=['A', 'B'])

for idx, a, b in df.itertuples():
    print(idx, a, b)

0 10 20
1 30 40
2 50 60

In your case, a good convention to use would be to indicate an unused variable by _:

for _, file_date, file_name in process_list[['date', 'name']].itertuples():
    # do something

Option 2

Use index=False argument and unpack 2 elements:

for file_date, file_name in process_list[['date', 'name']].itertuples(index=False):
    # do something

The behaviour is indicated in the documentation:

DataFrame.itertuples(index=True, name='Pandas')

Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.