2

I have a large dataset, and am trying to group certain rows by a specific condition (in this case, by all except the last two letters of a word, i.e some_string[-1]).

I first select the rows and store them in a dictionary with the keys as the first part of the word, and the values as a list of tuples of rows that fulfil that condition.

(I don't know if this is the best method, please feel free to suggest!)

def group_by_name(data, name_column): 
  #simple grouping of bookings according to everything except last two letters of name
  buckets = {};
  i =0;
  for index,booking in data.iterrows():
      buckets.setdefault(str(booking[name_column])[:-1],[]).append((index,booking))
  return buckets

This returns a list of objects per key - how can I recast these objects into a dataframe such that I can read and manipulate them further more easily?

2 Answers 2

1

I think you need groupby:

data = pd.DataFrame({  'D':[1,3,5,7,1],
                   'E':[5,3,6,9,2],
                   'F':['asd','tty','tty','tty','asd']})

print (data)
   D  E    F
0  1  5  asd
1  3  3  tty
2  5  6  tty
3  7  9  tty
4  1  2  asd

for i, g in data.groupby(data['F'].str[:-2]):
    print (i)
    print (g)

a
   D  E    F
0  1  5  asd
4  1  2  asd
t
   D  E    F
1  3  3  tty
2  5  6  tty
3  7  9  tty
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, groupby was what I was looking for! Using .apply also worked better than the syntax above. Thanks.
1

This might be of some help

df[df['A'] > 0]
                A         B         C         D     E   0
2000-01-01  0.469112 -0.282863 -1.509059 -1.135632 NaN NaN
2000-01-02  1.212112 -0.173215  0.119209 -1.044236 NaN NaN
2000-01-04  7.000000 -0.706771 -1.039575  0.271860 NaN NaN
2000-01-07  0.404705  0.577046 -1.715002 -1.039268 NaN NaN

Check out pandas documentation here might help you refine the above logic you want.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.