0

I am trying to loop through an entire Python Pandas DataFrame , but it does not seem to be looping through the entire DataFrame. It works for DataFrames of shorter lengths but not this one. Also, I am working in Jupyter Notebook.

I have added some print statements to try and debug.

def dropNotIn(df):

    print(df.shape)

    removedlist = []
    droplist = []

    for i, x in df.iterrows():
        rownum = i

    print(rownum)
    print(len(df))

Results for dropNotIn(df):

(59610, 9)
3449 --> Expected to be 59610
59610

Here is my df.head():

    date    attendance  venue_city  venue_state venue_name  away_team   home_team   away_points home_points
9   2015-12-13  1740.0  Chicago IL  McGrath-Phillips Arena  Arkansas-Little Rock    DePaul  66  44
13  2015-11-22  0.0 St. Thomas  NaN Virgin Islands Sport & Fitness Center   Tulsa   Indiana State   67  59
14  2014-12-04  3469.0  St. Bonaventure NY  Reilly Center   Buffalo St. Bonaventure 63  72
21  2015-11-20  1522.0  St. Thomas  NaN Virgin Islands Sport & Fitness Center   Hofstra Florida State   82  77
24  2014-11-23  NaN St. Thomas  NaN Virgin Islands Sport & Fitness Center   Gardner-Webb    Seton Hall  67  85
8
  • You'll have to share a representative data sample. Commented Feb 9, 2019 at 2:29
  • @pyeR_biz, added it. Commented Feb 9, 2019 at 2:32
  • Expect 3469, I did not see other values which you mention in the desired output in your DataFrame. What are the conditions that need to be match for your desired output? Commented Feb 9, 2019 at 2:37
  • @hemanta, I've edited my question again to exclude the conditions. I guess my question does not really need to include the if statement. Commented Feb 9, 2019 at 2:43
  • I am trying to loop through the whole df. Each time through the loop, I am setting rownum = i, yet at the end, when I print rownum, it does not match with the size/len of the df. Commented Feb 9, 2019 at 2:43

2 Answers 2

1

In pandas, DataFrame.iterrows() yields the index and the row. The index is something you control, and looking at your sample data you don't have an index that is densely-packed integers, but something else.

Try this code instead:

def dropNotIn(df):

    print(df.shape)

    removedlist = []
    droplist = []

    num_rows = 0
    for i, x in df.iterrows():
        num_rows += 1

    print(num_rows)
    print(len(df))

This counts the rows explicitly, instead of trying to use the index. If you really want to count rows during your operations, I'd suggest using the builtin function enumerate for this:

for num, (index, row) in enumerate(df.iterrows()):
   pass

However, I suspect you probably don't want to do that, because when you're doing things with a dataframe you want to vectorize them as much as possible.

Sign up to request clarification or add additional context in comments.

Comments

0

The iterrow iterate around the index which is not equal to rownum. You may have some indexes with more than one row.

Try unpacking the x,y = df.shape() and iterate around a range(x)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.