In the context of a Gibbs sampler, I profiled my code and my major bottleneck is the following:
I need to compute the likelihood of N points assuming they have been drawn from N normal distributions (with different means but same variance).
Here are two ways to compute it:
import numpy as np
from scipy.stats import multivariate_normal
from scipy.stats import norm
# Toy data
y = np.random.uniform(low=-1, high=1, size=100) # data points
loc = np.zeros(len(y)) # means
# Two alternatives
%timeit multivariate_normal.logpdf(y, mean=loc, cov=1)
%timeit sum(norm.logpdf(y, loc=loc, scale=1))
The first: use the recently implemented
multivariate_normalof scipy building a. Build the equivalentN-dimensional gaussian and computingcompute the probability(log)probability of aN-dimensionaly.1000 loops, best of 3: 1.33 ms per loop
The second: computeuse the traditional
normfunction of scipy. Compute the individual loglikelihoods(log)probability of every pointyand then sum the results.10000 loops, best of 3: 130 µs per loop
Since this is part of a Gibbs sampler, I need to repeat this computation around a 10.000 times, and therefore I need it to be as fast as possible.
How can I improve it?
(fromeither from python or calling Cython, R or whatever)