Python: Script to compare a header and a value of a .csv, update values

Question

I have a .csv file of the following form:

I need to parse through the whole csv file and replace 0 with 1 on the corresponding color, when I find it on the "Palette" section.

For example, for the first row, there are two values on the "Palette" section of the image, "Black" and "Blue". I need to replace the corresponding colors in the same row with 1 (so Black and Blue sections).

Can you provide a sample of the csv file in text form if possible. — Will Derriman
– Will Derriman, Commented Dec 14, 2021 at 17:22

butterflyknife · Accepted Answer · 2021-12-14 17:12:27Z

I have something, but I'm not sure how it'll scale.

Test dataframe:

df = pd.DataFrame({
    "image" : ['photo1', 'photo2', 'photo3', 'photo4'],
    "palette" : ['["Black", "Blue"]', 'Yellow', 'Black', '["Yellow", "Blue"]']
})

Output:

First step: convert the strings to actual lists.

def wrap_eval(x):
    try:
        return eval(x)
    except:
        return [x]
    
df["palette"] = df["palette"].apply(wrap_eval)

Output; it looks very similar, but if you check for example, df.loc[0, "palatte"], you'll see that we have a list of strings now rather than a string that happens to look like a list:

Now, we're going to iterate down the rows, (1) test to see if a column exists for each colour in the "palette" list in each row, (2) if it doesn't, add the column, with values of zero all the way down, and lastly (3), the column will exist by now, so set the value for it in this row to 1.

for i, row in df.iterrows():
    for colour in row["palette"]:
        try:
            df[colour]             # (1) in the steps above.
        except:
            df[colour] = 0         # (2)
        finally:
            df.loc[i, colour] = 1  # (3)

If you try this please do let me know how many rows your dataframe has and how long it takes!
Thank you very much for your answer. It works wonders! Funny thing, I have created the first .csv, and I put all the zeroes. I' ll fix that too. Your approach of adding them later is very clever. The .csv isn't very big yet (200 rows / 15 columns) so the execution is instant. Thanks again!
The only problem it may occur is for a value to NOT exist in the Palette column, so I guess the corresponding color will never be created. I don't need to be so strict though :P
You're right, it won't. But if you know the list of colours beforehand, then you can pre-populate the columns with zeros all the way down (as you say you have done), and the code will still work the same, I'm pretty sure.

Collectives™ on Stack Overflow

Python: Script to compare a header and a value of a .csv, update values

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related