Improve performance of Gaussian kernel function evaluation

Question

I need to improve the performance of a function that calculates the integral of a two-dimensional kernel density estimate (obtained using the function stats.gaussian_kde) where the domain of integration is all points that evaluate below a given value.

I've found that a single line takes up pretty much 100% of the time used by the code:

insample = kernel(sample) < iso

(see the code below). This line stores in the list insample all the values in sample that evaluate in kernel to less than the float iso.

I made this question (slightly modified) 6 months ago over at Stack Overflow and the best answer involved parallelization. I'm hoping to avoid parallelization (gives way too much trouble as can be seen in the question linked above), but I'm open to any improvement in performance achieved by means of virtually any other way (including Cython and any package available).

Here's the minimum working example (MWE):

import numpy as np
from scipy import stats

# Define KDE integration function.
def kde_integration(m1, m2):

    # Perform a kernel density estimate (KDE) on the data.
    values = np.vstack([m1, m2])
    kernel = stats.gaussian_kde(values, bw_method=None)

    # This list will be returned at the end of this function.
    out_list = []

    # Iterate through all floats in m1, m2 lists and calculate for each one the
    # integral of the KDE for the domain of points located *below* the KDE
    # value of said float eveluated in the KDE.
    for indx, m1_p in enumerate(m1):

        # Compute the point below which to integrate.
        iso = kernel((m1_p, m2[indx]))

        # Sample KDE distribution
        sample = kernel.resample(size=100)

        # THIS TAKES ALMOST 100% OF THE COMPUTATION TIME.
        # Filter the sample.
        insample = kernel(sample) < iso

        # Monte Carlo Integral.
        integral = insample.sum() / float(insample.shape[0])
        # Avoid 'nan' and/or 'infinite' values down the line.
        integral = integral if integral > 0. else 0.000001

        # Append integral value for this point to list that will return.
        out_list.append(round(integral, 2))

    return out_list

# Generate some random two-dimensional data:
def measure(n):
    "Return two coupled measurements."
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2

# Random data.
m1, m2 = measure(100)
# Call KDE integration function.
kde_integration(m1, m2)

Any help/suggestions/ideas will be very much appreciated.

Community · Accepted Answer · 2020-06-10 13:24:26Z

6

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
    values = np.vstack((m1, m2))
    kernel = stats.gaussian_kde(values, bw_method=None)
    iso = kernel(values)
    sample = kernel.resample(size=sample_size * len(m1))
    insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
    return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

edited Jun 10, 2020 at 13:24

CommunityBot

1

answered Feb 27, 2014 at 2:44

Gareth Rees

50.1k3 gold badges130 silver badges211 bronze badges

\$\begingroup\$ Thank you @Gareth! I checked and this function seems slightly faster than the original for small sample_size but quite slower for larger values (try sample_size=1000) Can you confirm this? \$\endgroup\$

Gabriel
– Gabriel

2014-02-27 11:59:12 +00:00
Commented Feb 27, 2014 at 11:59
\$\begingroup\$ Also this function returns slightly different values and I can't figure out why. \$\endgroup\$

Gabriel
– Gabriel

2014-02-27 12:10:14 +00:00
Commented Feb 27, 2014 at 12:10
\$\begingroup\$ See revised answer. \$\endgroup\$

Gareth Rees
– Gareth Rees

2014-02-27 12:36:30 +00:00
Commented Feb 27, 2014 at 12:36
\$\begingroup\$ In my case sample_size=100 and len(m1)=100 give a ~x3 speed improvement and this appears to be the maximum I can get. If I fix sample_size=1000 I could not find a len(m1) size where the new function performed significantly better. (You are right, the results are not exactly the same because of the resampling done by kernel.resample) \$\endgroup\$

Gabriel
– Gabriel

2014-02-27 13:05:38 +00:00
Commented Feb 27, 2014 at 13:05
\$\begingroup\$ Add: I can only get a minimal improvement with sample_size=1000 if len(m1)<50, but it's a really minor difference. \$\endgroup\$

Gabriel
– Gabriel

2014-02-27 13:11:54 +00:00
Commented Feb 27, 2014 at 13:11

Add a comment |

Stack Exchange Network

Improve performance of Gaussian kernel function evaluation

1 Answer 1

Update

You must log in to answer this question.

Hot Network Questions

Improve performance of Gaussian kernel function evaluation

1 Answer 1

Update

You must log in to answer this question.

Related

Hot Network Questions