In the context of a Gibbs sampler, I profiled my code and my major bottleneck is the following:
I need to compute the likelihood of N points assuming they have been drawn from N normal distributions (with different means but same variance).
Here are two ways to compute it:
import numpy as np
from scipy.stats import multivariate_normal
from scipy.stats import norm
# Toy data
y = np.random.uniform(low=-1, high=1, size=100) # data points
loc = np.zeros(len(y)) # means
# Two alternatives
%timeit multivariate_normal.logpdf(y, mean=loc, cov=1)
%timeit sum(norm.logpdf(y, loc=loc, scale=1))
The first: use the recently implemented
multivariate_normalof scipy building aN-dimensional gaussian and computing the probability of aN-dimensionaly.1000 loops, best of 3: 1.33 ms per loop
The second: compute the individual loglikelihoods of every point
yand then sum the results.10000 loops, best of 3: 130 µs per loop
Since this is part of a Gibbs sampler, I need to repeat this computation around a 10.000 times, I need it to be as fast as possible.
How can I improve it? (from python or calling Cython, R or whatever)