I have a large dataset, and am trying to group certain rows by a specific condition (in this case, by all except the last two letters of a word, i.e some_string[-1]).
I first select the rows and store them in a dictionary with the keys as the first part of the word, and the values as a list of tuples of rows that fulfil that condition.
(I don't know if this is the best method, please feel free to suggest!)
def group_by_name(data, name_column):
#simple grouping of bookings according to everything except last two letters of name
buckets = {};
i =0;
for index,booking in data.iterrows():
buckets.setdefault(str(booking[name_column])[:-1],[]).append((index,booking))
return buckets
This returns a list of objects per key - how can I recast these objects into a dataframe such that I can read and manipulate them further more easily?