I have a dataframe with columns code and images.
Column images is a string of urls joined by a comma: <URL>,<URL2>,...
Column code is NOT unique and I need to make it unique but store all images (from all variants) in a new column images_all.
For example:
code something images
1 x url1,url2,url3
1 x url1,url4
Result is: code something images_all 1 x url1,url2,url3,url4
I did
grouped = csv.groupby('code')
csv = csv.drop_duplicates(subset=['code'], keep='last')
csv['images_all'] = csv.apply(lambda r: list(set(
[image for image in grouped.get_group(r['code'])['images']]
)))
which raises:
KeyError: 'code'
But even if it didn't raise this, the problem is that images wouldn't be [url1,url2,url3,url4] . Instead, it would be ["url1,url2,url3","url1,url4"].
Do you know how to fix it?
EDIT
I also want to keep other columns (they are the same for all rows with the same code, that's why I then just drop_duplicates and keep the last row)