9
\$\begingroup\$

A quick way to try to do this would be to use scipy.ndimage.map_coordinates, mapping radius to one axis of a rectangular array, and angle to the other, then summing along the radius axis.

But, I think I have to divide that array by r, since the pixels near r=0 get expanded to occupy many more "pixels" in the rectangular array than those at large r, and I am not sure in the end how to implement that quantitatively correctly.

The script below generates some fake data and implements a binning into 360 slots centered on integer degrees.

It works, and for modest data sizes, the looping over individual pixels is not too slow.

for i, value in zip(itheta.flatten(), p.flatten()):
    hist[i] += value

But, I feel that there must somehow be a more "elegant" way, and one that puts the looping back into compiled code, using numpy slicing perhaps?

I've added the numpy tag; even if there may be some high level library that happens to have a feature like this, I like sticking to numpy and scipy implementations whenever possible.

Fake data:

fake data

Summed into bins:

fake data summed into angle bins

Script:

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter 
from scipy.special import erfc

np.random.seed(4241) # repeatable simulation for everyone

# establish cartesian and polar coordinate arrays
yx = np.mgrid[-250:251, -250:251]
y, x = yx
extent = [x.min(), x.max(), y.min(), y.max()]

r = np.sqrt(y**2 + x**2)
theta = np.arctan2(y, x)

# generate some fake data
p0 = np.cos(3*theta)**6 * np.sin(r**0.5)**6
cleanup = (1 - np.exp(-r**2 / 100)) * erfc((r - 200) / 8)
lumpy_1 = gaussian_filter(np.random.random(p0.shape), sigma=1)**2
lumpy_2 = gaussian_filter(np.random.random(p0.shape), sigma=60)
lumpy_2 -= lumpy_2.min()

p = p0 * cleanup * lumpy_1 * lumpy_2
p /= p.max()

plt.imshow(p, origin='lower', cmap='gray', extent=extent)
plt.show()

# assign a bin index to each pixel, CENTERED on integer degrees
# (remember that Python's .astype(int) by itself rounds both
#  +0.99 and -0.99 to zero, so we have to use floor first.
#  cf. https://scicomp.stackexchange.com/a/40779/17869)

itheta = np.floor(np.degrees(theta) + 0.5).astype(int) + 180
hist = np.zeros(361)

for i, value in zip(itheta.flatten(), p.flatten()):
    hist[i] += value

# clean up the fact that -180+/-0.5 is split into two bins
hist[-1] += hist[0]
hist = hist[1:]

fig, ax = plt.subplots(1, 1)
ax.plot(hist)
ax.set_xlabel('angle (deg)')
plt.show()
\$\endgroup\$
4
  • \$\begingroup\$ Maybe I should delete this and just ask in Stack Overflow for a numpy implementation instead? \$\endgroup\$ Commented Dec 31, 2024 at 11:32
  • 1
    \$\begingroup\$ In real life, do your data come in with polar or rectilinear axes? \$\endgroup\$ Commented Dec 31, 2024 at 15:12
  • 1
    \$\begingroup\$ Do you know the center for your actual data? Or are you guessing that? Ideally code posted here is the actual code you use in practice, not a MRE as on Stack Overflow. I’d like to see your actual analysis function, and a separate script that generates the fake data and runs that analysis (or loads it from a PNG file or whatever). \$\endgroup\$ Commented Dec 31, 2024 at 15:30
  • \$\begingroup\$ @CrisLuengo the actual data are 2D Fourier transform power spectra, where the center is indeed well-defined. It is unpublished data and so would be bad form to publish it here. The "actual analysis function" begins at the comment # assign a bin index to each pixel... \$\endgroup\$ Commented Dec 31, 2024 at 22:16

2 Answers 2

9
\$\begingroup\$

Numerics

np.sqrt(y**2 + x**2) should be replaced with a call to np.linalg.norm on your yx.

Don't min() and max() on your axes when you already know what the extents are. Since I demonstrate the use of a parametric n, just base your extent tuple on that.

Move the **6 to the outside of parentheses that surround the cos and sin terms.

As a minor suggestion, when there is an "easy" alternative to division like -r/100, write -0.01*r instead. There are subtle performance differences.

Don't for-loop over p. Instead use the bincount grouped sum pattern:

A possible use of bincount is to perform sums over variable-size chunks of an array, using the weights keyword.

w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights

x = np.array([0, 1, 1, 2, 2, 2])

np.bincount(x,  weights=w)
array([ 0.3,  0.7,  1.1])

Stats

I don't see why you would want to + 180. I omit this in my demonstration. However, it's important that you modulate. I don't think that your clean up the fact that [...] is warranted.

This is not a histogram! It's a binned sum. (It's also not the related kernel density estimate.) A histogram has a random variable as independent, and the frequencies of value bins for that same random variable as dependent. If you truly do care about both a histogram and an angular dimension, then you need a different kind of graph with independents of angle and value, and a third dependent of frequency. This could be represented by a heatmap, contour map, etc.

Data vis

Unless you have an extremely good reason to cmap='gray', don't. There's a reason that perceptually-uniform colour maps like Viridis were developed. That said, you should also add a colour bar to depict the relationship.

Don't plt.show() twice; only call it once.

In your histogram plot, the horizontal axis (and indeed any angular axis) should have ticks spaced at some multiple of 15 degrees.

There is value in (as an alternative, or at least a supplement) showing your binned sum in a polar projection. This more naturally explains the cyclic angle axis.

Python

You need to add functions, typehints, and a __main__ guard.

As a rule of thumb, I suggest that if you only use one symbol with only one reference from a given module, don't from-style import it; just use it fully qualified. This applies to erfc for example.

Suggested

import numpy as np
import matplotlib.gridspec
import matplotlib.pyplot as plt
import matplotlib.ticker
import scipy.special
from scipy.ndimage import gaussian_filter


def make_coords(n: int = 250) -> tuple[
    np.ndarray,  # r
    np.ndarray,  # theta
    tuple[int, int, int, int],  # axis extents
]:
    # regular cartesian dimensions
    ser = slice(-n, 1+n)
    yx = np.mgrid[ser, ser]
    y, x = yx
    extent = -n, n, -n, n

    # irregular polar dimensions based on cartesian dimensions
    r = np.linalg.norm(yx, axis=0)
    theta = np.arctan2(y, x)
    return r, theta, extent


def make_fake(
    r: np.ndarray, theta: np.ndarray,
    rand: np.random.Generator | None = None,
) -> np.ndarray:
    if rand is None:
        rand = np.random.default_rng()

    # generate some fake data
    p0 = (
        np.cos(3*theta) * np.sin(np.sqrt(r))
    )**6
    cleanup = (1 - np.exp(-0.01*r**2))*scipy.special.erfc((r - 200)/8)
    lumpy_1 = gaussian_filter(rand.random(r.shape), sigma=1)**2
    lumpy_2 = gaussian_filter(rand.random(r.shape), sigma=60)
    lumpy_2 -= lumpy_2.min()

    p = p0 * cleanup * lumpy_1 * lumpy_2
    p /= p.max()
    return p


def make_hist(theta: np.ndarray, p: np.ndarray) -> np.ndarray:
    """
    Not actually a histogram! This is a binned sum.

    assign a bin index to each pixel, centred on integer degrees
    (remember that Python's .astype(int) by itself rounds both
    +0.99 and -0.99 to zero, so we have to use floor first.
    cf. https://scicomp.stackexchange.com/a/40779/17869)
    """

    # 0-359, integral, where e.g. [-0.5, 0.5) maps to 0
    itheta = np.floor(np.rad2deg(theta) + 0.5).astype(int) % 360

    # binned sum with indices in itheta and values in p
    total = np.bincount(itheta.ravel(), weights=p.ravel())
    return total


def plot_image(
    fig: plt.Figure, ax: plt.Axes,
    p: np.ndarray, extent: tuple[int, int, int, int],
) -> None:
    im = ax.imshow(p, origin='lower', extent=extent)
    bar = fig.colorbar(mappable=im)

    ax.set_title('Sample dataset')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    bar.set_label('z')


def plot_hist_rect(
    ax: plt.Axes,
    hist: np.ndarray,
) -> None:
    ax.plot(hist)
    ax.xaxis.set_major_locator(matplotlib.ticker.MultipleLocator(30))
    ax.set_title('Binned sum')
    ax.set_xlabel('bin centre angle (deg)')
    ax.set_ylabel('sum')


def plot_hist_polar(
    ax: plt.PolarAxes,
    hist: np.ndarray,
) -> None:
    ax.plot(np.deg2rad(np.arange(360)), hist)
    ax.xaxis.set_major_locator(matplotlib.ticker.FixedLocator(
        np.deg2rad(np.arange(0, 360, 30)),
    ))
    ax.set_title('Binned sum')


def main() -> None:
    # repeatable simulation
    rand = np.random.default_rng(seed=4241)

    r, theta, extent = make_coords()
    p = make_fake(r=r, theta=theta, rand=rand)
    hist = make_hist(theta=theta, p=p)

    fig = plt.figure()
    grid = matplotlib.gridspec.GridSpec(figure=fig, nrows=2, ncols=2)
    plot_image(ax=fig.add_subplot(grid[0,0]), fig=fig, p=p, extent=extent)
    plot_hist_polar(ax=fig.add_subplot(grid[0,1], projection='polar'), hist=hist)
    plot_hist_rect(ax=fig.add_subplot(grid[1,:]), hist=hist)
    fig.tight_layout()
    plt.show()


if __name__ == '__main__':
    main()

plots

Actual histogram

An actual histogram could look like

def make_hist2d(theta: np.ndarray, p: np.ndarray) -> tuple[
    np.ndarray,  # angle bin edges
    np.ndarray,  # value bin edges
    np.ndarray,  # histogram matrix
]:
    continuous_angle = np.rad2deg(theta) % 360
    angle_bin_edges = np.arange(0, 361, 5)
    value_bin_edges = np.linspace(start=0, stop=1, num=21)
    hist, angle_edges, value_edges = np.histogram2d(
        x=continuous_angle.ravel(), y=p.ravel(),
        bins=[angle_bin_edges, value_bin_edges],
    )
    return angle_edges, value_edges, hist


def plot_hist2d_polar(
    ax: plt.PolarAxes,
    angle_bin_edges: np.ndarray,
    value_bin_edges: np.ndarray,
    hist: np.ndarray,
) -> None:
    vv, aa = np.meshgrid(value_bin_edges, angle_bin_edges)
    aa = np.deg2rad(aa)
    hist = np.log(1 + hist)
    ax.pcolormesh(aa, vv, hist, edgecolors='face')
    ax.xaxis.set_major_locator(matplotlib.ticker.FixedLocator(
        np.deg2rad(np.arange(0, 360, 30)),
    ))
    ax.tick_params(axis='y', which='both', labelcolor='white')
    ax.set_title('Histogram, log scale')

histogram

\$\endgroup\$
2
  • 1
    \$\begingroup\$ This answer has taken me "a year" to go through :-) almost there now... \$\endgroup\$ Commented Jan 3 at 2:54
  • 1
    \$\begingroup\$ I've used np.linalg.lstsq() for a while but never thought to explore other methods like .norm(), and would have never thought to look for np.bincount(), and yes, silly of me to start with 361 bins. Also, I remember when division was much, much slower than multiplication (and shorter variable names saved memory!) but I had thought that modern FP implementations did both with the same performance. OK I'll try to improve my "arithmetic hygene" :-) \$\endgroup\$ Commented Jan 8 at 2:53
4
\$\begingroup\$

The previous answer is excellent as it addresses many major points, and then some. Here are other very minor comments.

It is great that you fixed the random seed to make it easy for others to reproduce your results:

np.random.seed(4241) # repeatable simulation for everyone

It would also be handy to make it a command-line option to suppress this and allow other seeds.

UX

I second the recommendation to call this only once:

plt.show()

Get rid of the first plot command. It was a little confusing the first couple times I ran the code. I wasn't sure if one plot was covering the other; may as well make both visible at once.

Comments

The comments are helpful.

It would also be helpful to add a docstring at the top of the code to summarize its purpose and elaborate on the equations.

Naming

While humorous looking, this variable name:

lumpy_1

does not convey much meaning to me. Perhaps it is jargon that I'm unfamiliar with. Regardless, it could help to either rename it or at least document/comment on the convention, and why you have a second one (lumpy_2).

cleanup is a bit vague as well.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.