2

I want to add a new column to this DataFrame in Pandas where I assign a StoreID rolling thru the indexes:

It currently looks like this:

   Unnamed: 12  Store  
0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

I want it to look like this:

   Unnamed: 12  Store  StoreID
0          NaN      1  1
1          NaN      1  1
2          NaN      1  1
0          NaN      1  2
1          NaN      1  2
2          NaN      1  2
0          NaN      1  5
1          NaN      1  5
2          NaN      1  5
0          NaN      1  11
1          NaN      1  11
2          NaN      1  11

The variable changes upon the index hitting 0. The report will have variable numbers of items - most of them being 100's of 1000s of records per store.

I can create a new column easily but I can't seem to work out how to do this! Any help much appreciated - I'm just starting out with Python.

5
  • 1
    Why doesn't your output have index values of 3? Commented Jul 31, 2018 at 21:54
  • Just an inconsistency on my part. They should be the same Commented Jul 31, 2018 at 22:08
  • Any reason why the StoreID jumps from 2 to 5 then to 11? Commented Aug 1, 2018 at 12:52
  • It's just a list of references from stores that have no logic. I could map the 0,1,2 sequence to the customer sequence (0=1, 1=2, 2=5, 3=11) but is there a simpler way that doesn't require another operation ? Commented Aug 1, 2018 at 18:39
  • Okay, then I think one of the three solutions below answers you question. Commented Aug 1, 2018 at 18:46

4 Answers 4

1

You can also get the cumsum of the diff of the indexes

df['g'] = (df.index.to_series().diff() < 0).cumsum()

0    0
1    0
2    0
0    1
1    1
2    1
0    2
1    2
2    2
0    3
1    3
2    3
Sign up to request clarification or add additional context in comments.

Comments

1

Using np.ndarray.cumsum:

df['g'] = (df.index == 0).cumsum() - 1

print(df)

   col  Store  g
0  NaN      1  0
1  NaN      1  0
2  NaN      1  0
0  NaN      1  1
1  NaN      1  1
2  NaN      1  1
0  NaN      1  2
1  NaN      1  2
2  NaN      1  2
0  NaN      1  3
1  NaN      1  3
2  NaN      1  3

3 Comments

I like the idea directly get the result from the index
These are good suggestions but ideally I want the new column to roll through a custom number or text sequence (i.e. 1, 2, 5, 11) as opposed to (0, 1, 2, 3...). Any thoughts on how I could achieve this?
@user10011212, So, to be clear, you have an additional input specifying the "custom sequence", e.g. we can use L = [1, 2, 5, 11] as an input? Can you update your question accordingly?
1

IIUC Try cumcount

df.groupby(df.index).cumcount()
Out[11]: 
0    0
1    0
2    0
0    1
1    1
2    1
0    2
1    2
2    2
0    3
1    3
2    3
dtype: int64

Comments

0

Thanks for everyone's reply. I have ended up solving the problem with:

table['STORE_ID'] = (table.index == 0).cumsum() - 1

then adding some logic to lookup the store_id based on the sequence:

table.loc[table['STORE_ID'] == 3, 'STORE_ID'] = 11
table.loc[table['STORE_ID'] == 2, 'STORE_ID'] = 3
table.loc[table['STORE_ID'] == 1, 'STORE_ID'] = 2
table.loc[table['STORE_ID'] == 0, 'STORE_ID'] = 1

I imagine there's a simpler solution to get to the Store_ID sequence quicker but this gets the job done for now.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.