How to create a column with randomly generated values in a pandas dataframe [duplicate]

Question

I want to assign a random float (from 0 to 1) to a column that contains unique value within a Pandas dataframe.

Below is a dataframe with unique value of "region"; I want to create a new column with a unique randomly generated float (between 0 to 1) corresponds to each region.

I used random function to generate a random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column.

The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed.

import pandas as pd
import numpy as np
import random

list_reg = ['region1', 'region2', 'region3', 'region4', 'region5', 'region6']

df_test = pd.DataFrame({
    'region': list_reg,
    'product1': [100, 250, 350, 555, 999999, 200000],
    'product2': [41, 111, 12.14, 16.18, np.nan, 200003],
    'product3': [7.04, 2.09, 11.14, 2000320, 22.17, np.nan],
    'product4': [236, 249, 400, 0.56, 359, 122],
    'product5': [None, 1.33, 2.54, 1, 0.9, 3.2]})

# in case of a re-run, make sure the randomly generated number doesn't change
random.seed(123)
random_genator = random.uniform(0.0001, 1.0000)

The desired goal would be something like below

"random_generator": np.random.random(len(list_reg)) as the final column of df_test — JonSG
– JonSG, Commented Jan 30 at 20:19
Please don't post pictures of text. Instead, copy the text itself, edit it into your post, and use the formatting tools like code formatting (for print(df)) or table formatting (from print(df.to_markdown()). — wjandrea
– wjandrea, Commented Feb 4 at 22:22
"corresponds to each region" - What do you mean by that? Do you mean your real data has repeated regions so you need all occurrences of the same region to have the same random_genator? Please edit to clarify. — wjandrea
– wjandrea, Commented Feb 4 at 22:49
@wjandrea, I don't have billions of rows in my real dataset :) — user032020
– user032020, Commented Feb 9 at 14:35

user19077881 · Accepted Answer · 2025-01-30 22:00:46Z

1

To add the column to an existing DF, you can generate a list of the correct size using a comprehension:

df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]

which gives (for example):

    region  product1   product2    product3  product4  product5  random_genator
0  region1       100      41.00        7.04    236.00       NaN        0.052458
1  region2       250     111.00        2.09    249.00      1.33        0.087278
2  region3       350      12.14       11.14    400.00      2.54        0.407301
3  region4       555      16.18  2000320.00      0.56      1.00        0.107789
4  region5    999999        NaN       22.17    359.00      0.90        0.901209
5  region6    200000  200003.00         NaN    122.00      3.20        0.038250

answered Jan 30 at 22:00

user19077881

5,5892 gold badges8 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user032020 Jan 30 at 22:04

Thanks a lot. I want to understand the random.seed(123) on top of this code would still ensure the random number remain the same in case of re-running the code/process, right?

user19077881 Jan 30 at 22:10

Yes, that's right. The seed will ensure the same List is generated each time the code runs.

Collectives™ on Stack Overflow

How to create a column with randomly generated values in a pandas dataframe [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related