1

I am running a np.random.choice like the one below.

record = np.random.choice(data, size=6, p=prob)
        maxv = max(record)
        minv = min(record)
        val = record

From this I am finding the min and the max. I want to join this to an pandas dataframe. Below is my desired output:

Min,Max,value
1,5,2
1,5,3
1,5,3
1,5,5
1,5,1
1,5,3

This is an example of the output I would like from one simulation. Keep in mind I am performing this simulation many times so I would like to continuously be able to add onto the dataframe that is created. Each simulation will have its own min and max respectively. I also would like to keep the min and max in the output (why 1 and 5 are in the example output).

4
  • whats your question ? Commented Jul 22, 2015 at 1:21
  • How to create the desired output above from the example code in a pandas dataframe. Commented Jul 22, 2015 at 1:51
  • sorry should have made that more clear Commented Jul 22, 2015 at 1:52
  • basically how to create a dataframe. with the constant min and the max in the first two columns but then the other values in the third column Commented Jul 22, 2015 at 1:53

3 Answers 3

1

I'd create the df with the initial data column 'Val' and then just add the new columns in a one liner:

In [242]:
df = pd.DataFrame({'Val':np.random.randint(1,6,6)})
df['Min'], df['Max'] = df['Val'].min(), df['Val'].max()
df

Out[242]:
   Val  Min  Max
0    4    2    5
1    5    2    5
2    5    2    5
3    4    2    5
4    5    2    5
5    2    2    5
Sign up to request clarification or add additional context in comments.

Comments

0

This is how I solve it:

record = np.random.choice(data, size=6, p=prob)
maxv = [max(record)] * len(record)
minv = [min(record)] * len(record)

new_data = zip(minv, maxv, record)

df = DataFrame(new_data, columns=['Min', 'Max', 'val'])

3 Comments

Sorry for the late response but if I have the np.random.choice within a loop to produce a bunch of outputs how can I append them all to one dataframe?
if you get a chance please look at how I can append this from a loop
I don't quite get your problem here. But if produce multiple np.random.choice, you can use np.concatenate to concatenate the result first. However, in that case, I think EdCum version will be much better.
0

Simply iterate through simulation and append values into dataframe:

# CREATE DATA FRAME STRUCTURE
df = pd.DataFrame(columns=['Min', 'Max', 'val'])

# RUN SIMULATION IN LOOP ITERATION
record = np.random.choice(data, size=6, p=prob)

for i in range(len(record)):
    maxv = np.max(record)
    minv = np.min(record)
    val = record[i]   

    # APPEND ROW
    df.loc[len(df)] = [maxv, minv, val]

2 Comments

I believe that is an inefficient approach, though a common one. DataFrames, like arrays, occupy contiguous memory and it is very expensive to append to them. It's always better to append to a list (which is designed for that) and convert to a dataframe at the end. Also, you don't need the 0 in range, and you should use vectorized np.max and np.min on the whole record instead of individually on the rows. Just my two cents.
Excellent points @cxrodgers! Indeed, dataframes are intended to load at once and not appended. Only until recently did pandas allow the df.loc[i] as a row append. And this SO post shows the popularity of the row append. Plus, the OP mentioned running simulations many times. Feel free to downvote, but you'll get the upvote.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.