I have made a small Python script that will generate some test sets for my project.
The script generates 2 datasets with the same dimensions n*m. One contains binary values 0,1 and the other contains floats.
The script runs fine and generates the output I need, but if I want to scale to many dimensions the for-loop in pick_random() slows down my computation time.
How can I get rid of it? Perhaps with some array comprehension using numpy?
What throws my reasoning off is the if-stmt. Because the sampling should occur with a probability.
# Probabilities must sum to 1
AMOUNT1 = {0.6 : get_10_20,
           0.4 : get_20_30}
AMOUNT2 = {0.4 : get_10_20,
           0.6 : get_20_30}
OUTCOMES = [AMOUNT1, AMOUNT2]
def pick_random(prob_dict):
    '''
    Given a probability dictionary, with the first argument being the probability,
    Returns a random number given the probability dictionary
    '''
    r, s = random.random(), 0
    for num in prob_dict:
        s += num
        if s >= r:
            return prob_dict[num]()
def compute_trade_amount(action):
    '''
    Select with a probability, depending on the action.
    '''
    return pick_random(OUTCOMES[action])
ACTIONS = pd.DataFrame(np.random.randint(2, size=(n, m)))
AMOUNTS = ACTIONS.applymap(compute_trade_amount)