0

I want to assign a random float (from 0 to 1) to a column that contains unique value within a Pandas dataframe.

Below is a dataframe with unique value of "region"; I want to create a new column with a unique randomly generated float (between 0 to 1) corresponds to each region.

I used random function to generate a random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column.

The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed.

import pandas as pd
import numpy as np
import random

list_reg = ['region1', 'region2', 'region3', 'region4', 'region5', 'region6']

df_test = pd.DataFrame({
    'region': list_reg,
    'product1': [100, 250, 350, 555, 999999, 200000],
    'product2': [41, 111, 12.14, 16.18, np.nan, 200003],
    'product3': [7.04, 2.09, 11.14, 2000320, 22.17, np.nan],
    'product4': [236, 249, 400, 0.56, 359, 122],
    'product5': [None, 1.33, 2.54, 1, 0.9, 3.2]})

# in case of a re-run, make sure the randomly generated number doesn't change
random.seed(123)
random_genator = random.uniform(0.0001, 1.0000)

The desired goal would be something like below

enter image description here

9
  • "random_generator": np.random.random(len(list_reg)) as the final column of df_test Commented Jan 30 at 20:19
  • Please don't post pictures of text. Instead, copy the text itself, edit it into your post, and use the formatting tools like code formatting (for print(df)) or table formatting (from print(df.to_markdown()). Commented Feb 4 at 22:22
  • "corresponds to each region" - What do you mean by that? Do you mean your real data has repeated regions so you need all occurrences of the same region to have the same random_genator? Please edit to clarify. Commented Feb 4 at 22:49
  • Minor thing: why are you using 0.0001 instead of 0? Commented Feb 4 at 22:50
  • 1
    @wjandrea, I don't have billions of rows in my real dataset :) Commented Feb 9 at 14:35

1 Answer 1

1

To add the column to an existing DF, you can generate a list of the correct size using a comprehension:

df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]

which gives (for example):

    region  product1   product2    product3  product4  product5  random_genator
0  region1       100      41.00        7.04    236.00       NaN        0.052458
1  region2       250     111.00        2.09    249.00      1.33        0.087278
2  region3       350      12.14       11.14    400.00      2.54        0.407301
3  region4       555      16.18  2000320.00      0.56      1.00        0.107789
4  region5    999999        NaN       22.17    359.00      0.90        0.901209
5  region6    200000  200003.00         NaN    122.00      3.20        0.038250
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot. I want to understand the random.seed(123) on top of this code would still ensure the random number remain the same in case of re-running the code/process, right?
Yes, that's right. The seed will ensure the same List is generated each time the code runs.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.