Return to Revisions

1 of 4

asked Nov 13, 2014 at 11:22

Which is the fastest way to compute N likelihoods on normal distributions in Python / CPython?

In the context of a Gibbs sampler, I profiled my code and my major bottleneck is the following:

I need to compute the likelihood of N points assuming they have been drawn from N normal distributions (with different means but same variance).

Here are two ways to compute it:

import numpy as np
from scipy.stats import multivariate_normal
from scipy.stats import norm

# Toy data
y = np.random.uniform(low=-1, high=1, size=100) # data points
loc = np.zeros(len(y)) # means

# Two alternatives
%timeit multivariate_normal.logpdf(y, mean=loc, cov=1)
%timeit sum(norm.logpdf(y, loc=loc, scale=1))

The first: use the recently implemented multivariate_normal of scipy building a N-dimensional gaussian and computing the probability of a N-dimensional y.

1000 loops, best of 3: 1.33 ms per loop
The second: compute the individual loglikelihoods of every point y and then sum the results.

10000 loops, best of 3: 130 µs per loop

Since this is part of a Gibbs sampler, I need to repeat this computation around a 10.000 times, I need it to be as fast as possible.

How can I improve it? (from python or calling Cython, R or whatever)

asked Nov 13, 2014 at 11:22

alberto