2

I need some help in converting the following code to a more efficient one without using iterrows().

for index, row in df.iterrows():
alist=row['index_vec'].strip("[] ").split(",")
blist=[int(i) for i in alist]
for col in blist:
    df.loc[index, str(col)] = df.loc[index, str(col)] +1

The above code basically reads a string under 'index_vec' column, parses and converts to integers, and then increments the associated columns by one for each integer. An example of the output is shown below:

enter image description here

Take the 0th row as an example. Its string value is "[370, 370, -1]". So the above code increments column "370" by 2 and column "-1" by 1. The output display is truncated so that only "-10" to "17" columns are shown.

The use of iterrows() is very slow to process a large dataframe. I'd like to get some help in speeding it up. Thank you.

2 Answers 2

1

You can also use apply and set axis = 1 to go row wise. Then create a custom function pass into apply:

Example starting df:

      index_vec  1201  370  -1
0  [370, -1, -1]     0    0   1
1   [1201, 1201]     0    1   1
import pandas as pd 

df = pd.DataFrame({'index_vec': ["[370, -1, -1]", "[1201, 1201]"], '1201': [0, 0], '370': [0, 1], '-1': [1, 1]})

def add_counts(x):
  counts = pd.Series(x['index_vec'].strip("[]").split(", ")).value_counts()
  x[counts.index] = x[counts.index] + counts
  return x

df.apply(add_counts, axis = 1)

print(df)

Outputs:

      index_vec  1201  370  -1
0  [370, -1, -1]     0    1   3
1   [1201, 1201]     2    1   1
Sign up to request clarification or add additional context in comments.

4 Comments

This works. Thanks, Anna. The 'index_vec' does not have all the values I need. So I first manually created these columns. Then I added your code next. The manual column creation code is:
for i in range (neg_index, pos_index): df[str(i)]= 0 df[str(i)]= df[str(i)].astype(np.int16)
@David293836 hmm I think I have an idea of how you wouldn't need to do that manually & make it much faster. If you want to post that as a new question with the full code I can took a look at it.
Great. A new question has been posted here: stackoverflow.com/questions/61994503/… Thank you again, Anna.
1

Let us do

a=df['index_vec'].str.strip("[] ").str.split(",").explode()
s=pd.crosstab(a.index,a).reindex_like(df).fillna(0)
df=df.add(a)

5 Comments

The first line generated the following error message: AttributeError: 'Series' object has no attribute 'split'
@David293836 add str before split
Thanks. The 2nd line has problem with the fill_value argument in the reindex_like(). The error message is: TypeError: reindex_like() got an unexpected keyword argument 'fill_value'
Note that 'index_vec' does not have all the numbers. So I had to manually create all the columns from the lowest to the highest. (e.g., -10 to the upper limit).
Is the 2nd line supposed to be s = pd.crosstab(a.index,a).reindex_like(df).fillna(0) or a = pd.crosstab(a.index,a).reindex_like(df).fillna(0)?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.