5

I'm trying so simulate coin tosses and profits and plot the graph in matplotlib:

from random import choice
import matplotlib.pyplot as plt
import time

start_time = time.time()
num_of_graphs = 2000
tries = 2000
coins = [150, -100]
last_loss = 0


for a in range(num_of_graphs):
    profit = 0
    line = []
    for i in range(tries):
        profit = profit + choice(coins)
        if (profit < 0 and last_loss < i):
            last_loss = i
        line.append(profit)
    plt.plot(line)
plt.show()

print("--- %s seconds ---" % (time.time() - start_time))
print("No losses after " + str(last_loss) + " iterations")

The end result is

--- 9.30498194695 seconds ---
No losses after 310 iterations

Why is it taking so long to run this script? If I change num_of_graphs to 10000, the scripts never finishes.

How would you optimize this?

Screenshot of the script running in Jypiter

1
  • 1
    Probably better answers, but first thing I would do since you know how big line is going to be would be to use numpy and pre-allocate your array. line = np.zeros((2000,)) outside of either loop, followed by line[i] = profit inside the second loop. Allocate once and then keep rewriting. Commented Jul 28, 2018 at 3:38

3 Answers 3

4

Your measure of execution time is too rough. The following allows you to measure the time needed for the simulation, separate from the time needed for plotting:

It is using numpy.

import matplotlib.pyplot as plt
import numpy as np
import time


def run_sims(num_sims, num_flips):
    start = time.time()
    sims = [np.random.choice(coins, num_flips).cumsum() for _ in range(num_sims)]
    end = time.time()
    print(f"sim time = {end-start}")
    return sims


def plot_sims(sims):
    start = time.time()
    for line in sims:
        plt.plot(line)
    end = time.time()
    print(f"plotting time = {end-start}")
    plt.show()


if __name__ == '__main__':

    start_time = time.time()
    num_sims = 2000
    num_flips = 2000
    coins = np.array([150, -100])

    plot_sims(run_sims(num_sims, num_flips))

result:

sim time = 0.13962197303771973
plotting time = 6.621474981307983

As you can see, the sim time is greatly reduced (it was on the order of 7 seconds on my 2011 laptop); The plotting time is matplotlib dependent.

Sign up to request clarification or add additional context in comments.

Comments

4

matplotlib is getting slower as the script progresses because it is redrawing all of the lines that you have previously plotted - even the ones that have scrolled off the screen.

This is the answer from a previous post answered by Simon Gibbons.

matplotlib isn't optimized for speed, rather its graphics. Here are the links to a few which were developed for speed:

You can refer to the matplotlib cookbook for more about performance.

Comments

1

In order to better optimize your code, I would always try to replace loops by vectorization using numpy or, depending on my specific needs, other libraries that use numpy under the hood.

In this case, you could calculate and plot your profits this way:

import matplotlib.pyplot as plt
import time
import numpy as np

start_time = time.time()
num_of_graphs = 2000
tries = 2000
coins = [150, -100]

# Create a 2-D array with random choices
# rows for tries, columns for individual runs (graphs).
coin_tosses = np.random.choice(coins, (tries, num_of_graphs))

# Caculate 2-D array of profits by summing 
# cumulatively over rows (trials).
profits = coin_tosses.cumsum(axis=0)

# Plot everything in one shot.
plt.plot(profits)
plt.show()

print("--- %s seconds ---" % (time.time() - start_time))

In my configuration, this code took aprox. 6.3 seconds (6.2 plotting) to run, while your code took almost 15 seconds.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.