Fact-checked by Grok 4 months ago

Mean

In statistics and mathematics, the mean, often specifically the arithmetic mean, is a fundamental measure of central tendency that represents the average value of a dataset, calculated by summing all the numerical values in the set and dividing by the total number of values.[1] This computation provides a single representative value that summarizes the overall level of the data, making it widely used for its simplicity and interpretability in descriptive statistics.[2] The arithmetic mean is particularly effective for symmetric distributions but can be influenced by extreme outliers, potentially skewing the result away from the typical value.[3] Beyond the arithmetic mean, other types of means address specific data characteristics or applications, such as the geometric mean and harmonic mean. The geometric mean, computed as the nth root of the product of n positive numbers, is appropriate for averaging ratios, growth rates, or multiplicative processes, as it mitigates the impact of very large or small values compared to the arithmetic mean.[4] For instance, it is commonly applied in finance to calculate average returns over time[5] or in biology for modeling population growth.[6] The harmonic mean, defined as n divided by the sum of the reciprocals of the numbers, is ideal for averaging rates or ratios where the denominator varies, such as speeds over equal distances, and it always yields a value less than or equal to the geometric mean, which is itself less than or equal to the arithmetic mean for the same dataset.[7] These relationships, known as the AM-GM-HM inequality, highlight the arithmetic mean's tendency to produce the highest value among them for positive unequal numbers.[8] The concept of the mean traces its origins to ancient Greek mathematics, particularly the Pythagoreans, who developed the classical arithmetic, geometric, and harmonic means, where it initially described the midpoint between two numbers,[9] and evolved in the 16th century with astronomers like Tycho Brahe applying the arithmetic mean to reduce observational errors by averaging multiple measurements.[10] By the 19th century, Carl Friedrich Gauss advanced its role in statistics through the method of least squares, establishing the mean as the expected value in normal distributions and integral to inferential statistics.[11] Today, means underpin diverse fields including economics, engineering, and machine learning, where they facilitate data summarization, hypothesis testing, and model evaluation, though selection of the appropriate type depends on the data's scale and distribution properties.[12]

Classical Pythagorean Means

Arithmetic Mean

The arithmetic mean of a finite set of real numbers x1,x2,,xnx_1, x_2, \dots, x_n, where nn is the number of observations, is defined as their sum divided by the count:
xˉ=1ni=1nxi. \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i.
This measure provides a basic summary of the central value in the dataset and is applicable to any real numbers, positive or negative.[13] In statistical contexts, a distinction is drawn between the population mean, which applies to the entire dataset of size NN, and the sample mean, used for a subset of size nn. The population mean is calculated as
μ=1Ni=1Nxi, \mu = \frac{1}{N} \sum_{i=1}^N x_i,
while the sample mean is
xˉ=1ni=1nxi. \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i.
These formulas ensure the mean serves as an unbiased estimator when sampling from a larger population.[14] The arithmetic mean exhibits key mathematical properties, including linearity: for constants aa and bb, the mean of the transformed set aX+baX + b equals aa times the original mean plus bb, or AM(aX+b)=aAM(X)+b\mathrm{AM}(aX + b) = a \cdot \mathrm{AM}(X) + b. This affine property makes it useful for scaling and shifting data. However, the arithmetic mean is sensitive to outliers, as a single extreme value can significantly shift the result away from the typical values in the set.[15][16] The concept of the arithmetic mean traces back to ancient Greek mathematics.[17] It was later integrated into the method of least squares in the early 19th century by Carl Friedrich Gauss in his 1809 work Theoria motus corporum coelestium in sectionibus conicis solem ambientium, justifying its use for minimizing errors in astronomical observations.[18] Common applications include everyday calculations such as averaging daily temperatures or test scores. For example, the mean temperature for three days with readings of 20°C, 22°C, and 19°C is (20+22+19)/320.33(20 + 22 + 19)/3 \approx 20.33^\circC, providing a quick summary of the period's warmth. Similarly, for test scores of 85, 92, and 78, the arithmetic mean is (85+92+78)/3=85(85 + 92 + 78)/3 = 85, representing the group's overall performance.[16][19] For continuous data, the arithmetic mean generalizes to the average value of a function f(x)f(x) over an interval [a,b][a, b], given by
1baabf(x)dx. \frac{1}{b - a} \int_a^b f(x) \, dx.
This integral form extends the discrete summation, capturing the mean for continuously varying quantities like velocity over time.[20] For positive numbers, the arithmetic mean can be compared briefly to other classical means, such as the geometric or harmonic, though it remains the standard for additive averaging.[13]

Geometric Mean

The geometric mean of $ n $ positive real numbers $ x_1, x_2, \dots, x_n $ is defined as the $ n $-th root of their product, given by
GM=(i=1nxi)1/n. \text{GM} = \left( \prod_{i=1}^n x_i \right)^{1/n}.
[21] For two positive numbers $ x $ and $ y $, this simplifies to $ \sqrt{xy} $.[21] This measure is particularly suitable for aggregating ratios, rates of change, or positive data where multiplicative relationships dominate, such as in growth processes.[21] An equivalent formulation leverages logarithms, expressing the geometric mean as the exponential of the arithmetic mean of the natural logarithms:
GM=exp(1ni=1nlnxi). \text{GM} = \exp\left( \frac{1}{n} \sum_{i=1}^n \ln x_i \right).
[22] This property highlights its connection to logarithmic scales and makes it useful for data that span orders of magnitude, as the logarithm transforms multiplicative effects into additive ones.[22] The geometric mean finds applications in contexts involving compounded growth or proportional changes. In finance, it calculates the average annual return on investments over multiple periods, accounting for compounding effects; for example, successive returns of 10%, 20%, and -5% (corresponding to factors of 1.1, 1.2, and 0.95) yield a geometric mean of approximately 1.077, or an effective 7.7% annual rate.[21] In biology, it is used to summarize bacterial concentrations in environmental samples, such as water quality assessments, where data vary widely and geometric means provide a stable central tendency for log-normally distributed counts.[23] It also applies to modeling bacterial population growth rates in exponential phases, where multiplicative factors describe cell division over time.[23] A key property of the geometric mean for positive real numbers is that it is always less than or equal to the arithmetic mean, with equality if and only if all the numbers are equal; this is a consequence of the arithmetic mean-geometric mean (AM-GM) inequality.[24] As the multiplicative counterpart in the classical Pythagorean means—alongside the arithmetic and harmonic means—the geometric mean emphasizes balanced proportions in geometric progressions.[24]

Harmonic Mean

The harmonic mean of nn positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n is defined as the reciprocal of the arithmetic mean of their reciprocals:
HM=ni=1n1xi. \text{HM} = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}}.
[5] For two positive numbers aa and bb, this simplifies to
HM=2aba+b. \text{HM} = \frac{2ab}{a + b}.
[5] A key property of the harmonic mean is that it is always less than or equal to the geometric mean and the arithmetic mean for the same set of positive numbers, with equality holding if and only if all the numbers are equal.[25] This positions it as the smallest among the classical Pythagorean means.[25] The harmonic mean is particularly appropriate for datasets consisting of rates or ratios, as it gives greater weight to smaller values, providing a balanced measure in such contexts.[26] In applications involving rates, the harmonic mean yields the correct average when the quantities being averaged are inversely proportional to the rates, such as speeds over equal distances or the equivalent resistance of components in parallel.[26] For parallel resistors with resistances R1,R2,,RnR_1, R_2, \dots, R_n, the equivalent resistance ReqR_{\text{eq}} satisfies $ \frac{1}{R_{\text{eq}}} = \sum_{i=1}^n \frac{1}{R_i} $, so $ R_{\text{eq}} = \frac{n}{\sum_{i=1}^n \frac{1}{R_i}} $, which is precisely the harmonic mean of the individual resistances. In economics, it is used to average ratios like price-to-earnings multiples across companies, ensuring the result reflects the harmonic relationship in valuation metrics.[27] A common example illustrates its use for averaging speeds: suppose a vehicle travels one leg of a round trip at 60 km/h and the return leg at 40 km/h, with both legs covering equal distances. The arithmetic mean of 50 km/h overestimates the true average speed, but the harmonic mean gives
HM=2×60×4060+40=48 km/h, \text{HM} = \frac{2 \times 60 \times 40}{60 + 40} = 48 \text{ km/h},
which matches the total distance divided by total time.[26]

Relationships and Inequalities

The Pythagorean means—the arithmetic mean (AM), geometric mean (GM), and harmonic mean (HM)—are interconnected through the classical inequality HM ≤ GM ≤ AM, which holds for any finite collection of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n. Equality occurs if and only if x1=x2==xnx_1 = x_2 = \dots = x_n. This relationship highlights how the HM provides a lower bound, the GM an intermediate value, and the AM an upper bound, reflecting different ways of aggregating data while preserving order.[9][28] Proofs of this inequality often rely on convexity and Jensen's inequality. For the AM-GM portion, consider the convex function f(x)=logxf(x) = -\log x on (0,)(0, \infty). Jensen's inequality states that 1ni=1nf(xi)f(1ni=1nxi)\frac{1}{n} \sum_{i=1}^n f(x_i) \geq f\left( \frac{1}{n} \sum_{i=1}^n x_i \right), so 1ni=1n(logxi)log(1ni=1nxi)\frac{1}{n} \sum_{i=1}^n (-\log x_i) \geq -\log \left( \frac{1}{n} \sum_{i=1}^n x_i \right). Rearranging yields log(1ni=1nxi)1ni=1nlogxi=log((i=1nxi)1/n)\log \left( \frac{1}{n} \sum_{i=1}^n x_i \right) \geq \frac{1}{n} \sum_{i=1}^n \log x_i = \log \left( \left( \prod_{i=1}^n x_i \right)^{1/n} \right), implying AM ≥ GM, with equality when all xix_i are equal. The GM-HM portion follows by applying AM-GM to the reciprocals 1/xi1/x_i, since HM is the reciprocal of the AM of the reciprocals.[29][30] Geometrically, the means for two positive numbers aa and bb admit constructions involving basic figures. The GM ab\sqrt{ab} is the length of the side of a square with area equal to that of a rectangle of sides aa and bb, or the altitude to the hypotenuse in a right triangle with legs aa and bb. The HM 2ab/(a+b)2ab/(a+b) arises in the context of harmonic divisions, such as the intersection points of parallel lines with transversals or in circle inversions preserving angles. The AM (a+b)/2(a+b)/2 corresponds to the midpoint of a line segment joining aa and bb. These interpretations underscore the means' roles in proportion and similarity.[9] The inequality has significant applications in optimization, particularly for problems involving constraints on sums or products. For instance, to minimize xi\prod x_i subject to a fixed sum xi=S\sum x_i = S with xi>0x_i > 0, the AM-GM inequality implies the minimum occurs when all xi=S/nx_i = S/n, achieving equality. This technique appears in resource allocation and extremal problems, as elaborated in Hardy, Littlewood, and Pólya's seminal work on inequalities.[31][32] Extensions of the inequality apply to any number of variables, with proofs generalizing via induction on nn (starting from the two-variable case) or directly through Jensen's inequality for the weighted form. Historical roots trace to the Pythagorean school around the 6th century BCE, where the means emerged from studies of musical intervals and geometric proportions—such as hammer weights producing harmonies via HM ratios—though formal definitions were provided later by Archytas (c. 428–347 BCE) and Euclid in the Elements.[33][34][29]

Means in Statistics and Probability

Arithmetic Mean as Central Tendency

In statistics, the arithmetic mean, often denoted as xˉ\bar{x}, represents the sample mean calculated from a dataset and serves as the primary estimator of the population mean μ\mu.[35] This measure sums all observed values and divides by the sample size nn, providing a straightforward summary of the data's central location.[36] A key property of the sample mean is its unbiasedness: the expected value E(xˉ)=μE(\bar{x}) = \mu, holding for any distribution with a finite population mean, ensuring that over repeated samples, the average of the sample means equals the true population parameter.[35] Additionally, the variance of the sample mean is given by σ2n\frac{\sigma^2}{n}, where σ2\sigma^2 is the population variance, indicating that larger sample sizes reduce estimation uncertainty.[37] These properties make the arithmetic mean a foundational tool in parametric inference, though its performance assumes certain data characteristics. Compared to other measures of central tendency, the arithmetic mean is more sensitive to outliers than the median, which resists extreme values by focusing on the middle ordered value, or the mode, which identifies the most frequent value.[38] It performs best with symmetric distributions where data points cluster evenly around the center, but in skewed datasets, it can be pulled toward extremes, misrepresenting the typical value.[39] The arithmetic mean plays a central role in hypothesis testing, such as t-tests comparing group means, and in constructing confidence intervals, which quantify the precision of xˉ\bar{x} as an estimate of μ\mu using the standard error sn\frac{s}{\sqrt{n}}, where ss is the sample standard deviation.[40] For example, in population income data, which often exhibits right-skewness due to a few high earners, the mean income tends to exceed the median, potentially overstating the central tendency for most individuals.[41] This skewness highlights a limitation in non-normal distributions, where the mean's sensitivity to tails can lead to biased interpretations of centrality, even though it remains an unbiased estimator in expectation.[39] In such cases, modern robust alternatives, like the median or trimmed means that exclude extreme values, offer more reliable measures for skewed data.[39]

Expected Value of a Probability Distribution

In probability theory, the expected value of a random variable XX, denoted E[X]E[X], represents the long-run average value of XX over many independent repetitions of the experiment, serving as the first moment of the probability distribution.[42] For a discrete random variable taking values xix_i with probabilities p(xi)p(x_i), the expected value is defined as the sum
E[X]=ixip(xi), E[X] = \sum_i x_i p(x_i),
where the sum is over all possible values in the support of the distribution.[42] For a continuous random variable with probability density function f(x)f(x), the expected value is the integral
E[X]=xf(x)dx. E[X] = \int_{-\infty}^{\infty} x f(x) \, dx.
[43]
Key properties of the expected value include linearity, which holds regardless of dependence between variables: for constants aa and bb and random variables XX and YY,
E[aX+bY]=aE[X]+bE[Y]. E[aX + bY] = a E[X] + b E[Y].
[44] This property facilitates computation for complex expressions by decomposing them into simpler components.[45] The expected value also relates directly to the variance, a measure of dispersion defined as
Var(X)=E[(XE[X])2]=E[X2](E[X])2, \text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2,
quantifying the average squared deviation from the mean.[46] Specific distributions yield closed-form expected values that highlight its role as the mean parameter. For a binomial random variable XBin(n,p)X \sim \text{Bin}(n, p) modeling the number of successes in nn independent trials each with success probability pp, the expected value is E[X]=npE[X] = np.[47] For a normal distribution XN(μ,σ2)X \sim N(\mu, \sigma^2), the expected value is exactly the location parameter E[X]=μE[X] = \mu, centering the symmetric bell-shaped density.[48] The central limit theorem underscores the expected value's theoretical importance: for independent and identically distributed random variables X1,,XnX_1, \dots, X_n with finite mean μ=E[Xi]\mu = E[X_i] and variance σ2>0\sigma^2 > 0, the sample mean Xˉn=n1i=1nXi\bar{X}_n = n^{-1} \sum_{i=1}^n X_i converges in distribution to N(μ,σ2/n)N(\mu, \sigma^2 / n) as nn \to \infty, implying that Xˉn\bar{X}_n concentrates around the population expected value μ\mu.[49] This convergence justifies using sample averages to estimate population means in large datasets. In applications, the expected value informs risk assessment by quantifying average outcomes under uncertainty, such as calculating the anticipated loss or gain in financial portfolios to guide investment decisions.[50] It also supports forecasting by providing the predicted average value in probabilistic models, as in project management where expected completion times account for variable durations across scenarios.[51] Computing expected values becomes challenging in high-dimensional settings where analytical integration is intractable, prompting the use of Monte Carlo methods, which approximate E[X]E[X] by averaging samples drawn from the distribution: E^[X]n1i=1nXi\hat{E}[X] \approx n^{-1} \sum_{i=1}^n X_i for large nn, with error decreasing as O(1/n)O(1/\sqrt{n}) independent of dimension.[52] In 2025 AI contexts, these methods are integral to uncertainty quantification in neural networks, such as Monte Carlo dropout for estimating predictive means in high-dimensional inference tasks like image recognition or reinforcement learning.[53]

Weighted Arithmetic Mean

The weighted arithmetic mean extends the concept of the arithmetic mean by assigning different levels of importance, or weights, to each data point in a dataset. For a finite set of values x1,x2,,xnx_1, x_2, \dots, x_n with corresponding positive weights w1,w2,,wn>0w_1, w_2, \dots, w_n > 0, the weighted arithmetic mean xˉw\bar{x}_w is defined as
xˉw=i=1nwixii=1nwi. \bar{x}_w = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}.
This formulation ensures that the result is a weighted average that accounts for the relative significance of each xix_i, as established in standard statistical texts on measures of central tendency.[54] A key property of the weighted arithmetic mean is that it reduces to the ordinary arithmetic mean when all weights are equal, i.e., wi=1w_i = 1 for all ii, providing a direct generalization of the unweighted case. Additionally, when the weights are normalized such that i=1nwi=1\sum_{i=1}^n w_i = 1 and each wi0w_i \geq 0, the expression simplifies to xˉw=i=1nwixi\bar{x}_w = \sum_{i=1}^n w_i x_i, forming a convex combination of the values, which lies within the convex hull of the data points and preserves properties like boundedness between the minimum and maximum values. This convexity ensures the weighted mean is a stable estimator in optimization contexts, as it maintains the affine structure of linear combinations.[55][54] In applications, the weighted arithmetic mean is widely used in education to compute grade point averages (GPAs), where course grades are weighted by the number of credit hours to reflect their relative academic load. For instance, a student with grades of 3.0 in a 3-credit course and 4.0 in a 4-credit course has a GPA of (3×3.0+4×4.0)/(3+4)=3.57(3 \times 3.0 + 4 \times 4.0)/(3+4) = 3.57. In survey sampling, particularly stratified random sampling, it estimates population parameters by weighting stratum-specific sample means proportionally to the stratum sizes, improving precision over simple random sampling for heterogeneous populations. An example occurs in finance, where the expected return of an investment portfolio is calculated as the weighted arithmetic mean of individual asset returns, with weights corresponding to the proportion of capital allocated to each asset, such as 60% in stocks yielding 8% and 40% in bonds yielding 4%, resulting in a portfolio return of 0.6×8%+0.4×4%=6.4%0.6 \times 8\% + 0.4 \times 4\% = 6.4\%.[56] More recently, the weighted arithmetic mean has found applications in machine learning, particularly in weighted loss functions that adjust training emphasis for imbalanced datasets. In object detection tasks, the focal loss function, introduced in RetinaNet, dynamically weights the cross-entropy loss to down-weight easy examples and focus on hard negatives, achieving substantial improvements in mean average precision (mAP) by up to 5.7 points on the COCO dataset compared to prior methods. This approach highlights the weighted mean's role in enhancing model performance by prioritizing challenging data points during optimization.[57]

Generalized Means

Power Mean

The power mean of order pp, also known as the generalized mean or Hölder mean, is a family of means that generalizes several classical averages for a finite set of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n. For p0p \neq 0, it is defined as
Mp(x)=(1ni=1nxip)1/p, M_p(\mathbf{x}) = \left( \frac{1}{n} \sum_{i=1}^n x_i^p \right)^{1/p},
where x=(x1,,xn)\mathbf{x} = (x_1, \dots, x_n) and pRp \in \mathbb{R} is the order parameter.[58] For p=0p = 0, the power mean is defined as the limit
M0(x)=limp0Mp(x)=exp(1ni=1nlnxi), M_0(\mathbf{x}) = \lim_{p \to 0} M_p(\mathbf{x}) = \exp\left( \frac{1}{n} \sum_{i=1}^n \ln x_i \right),
which yields the geometric mean. As pp \to \infty, Mp(x)M_p(\mathbf{x}) approaches the maximum value maxixi\max_i x_i, and as pp \to -\infty, it approaches the minimum minixi\min_i x_i.[58] This family was introduced by Otto Hölder in the context of his inequality, providing a unified framework for aggregating data based on the exponent pp.[59] The power mean relates directly to the classical Pythagorean means as special cases: M1(x)M_1(\mathbf{x}) is the arithmetic mean, M0(x)M_0(\mathbf{x}) (in the limit) is the geometric mean, and M1(x)M_{-1}(\mathbf{x}) is the harmonic mean. A key property is its monotonicity: for fixed x\mathbf{x} with all xi>0x_i > 0, Mp(x)M_p(\mathbf{x}) is non-decreasing in pp, meaning Mq(x)Mp(x)M_q(\mathbf{x}) \leq M_p(\mathbf{x}) whenever qpq \leq p.[59] This monotonicity underpins the power mean inequality, which generalizes classical inequalities like AM-GM-HM and holds for all real pqp \geq q. Power means find applications in inequality theory, where the monotonicity property extends classical results to arbitrary orders, facilitating proofs in optimization and analysis.[60] In signal processing, the case p=2p=2 corresponds to the root mean square (RMS),
M2(x)=1ni=1nxi2, M_2(\mathbf{x}) = \sqrt{\frac{1}{n} \sum_{i=1}^n x_i^2},
which measures the effective magnitude of alternating signals, such as voltages in AC circuits, where it equals the DC value producing the same power dissipation.[58] For example, for a sinusoidal voltage v(t)=Vsin(ωt)v(t) = V \sin(\omega t), the RMS value is V/2V / \sqrt{2}, providing a standardized metric for power calculations in electrical engineering.[61]

Quasi-Arithmetic Mean

The quasi-arithmetic mean, also known as the Kolmogorov mean or ff-mean, generalizes classical means for a finite set of positive real numbers x1,x2,,xnx_1, x_2, \dots, x_n via a continuous and strictly monotonic function f:(0,)Rf: (0, \infty) \to \mathbb{R}. It is defined as
Mf(x1,,xn)=f1(1ni=1nf(xi)), M_f(x_1, \dots, x_n) = f^{-1}\left( \frac{1}{n} \sum_{i=1}^n f(x_i) \right),
where f1f^{-1} denotes the inverse function of ff. This formulation allows the construction of means tailored to specific transformation behaviors by varying ff. The concept was introduced by Andrey Kolmogorov in 1930, with independent work by Mitio Nagumo in the same year characterizing such means through functional equations.[62] Particular selections of ff recover standard means. When f(x)=xf(x) = x, MfM_f is the arithmetic mean. For f(x)=lnxf(x) = \ln x (with xi>0x_i > 0), it yields the geometric mean. Choosing f(x)=1/xf(x) = 1/x (again for xi>0x_i > 0) produces the harmonic mean. These examples illustrate how the quasi-arithmetic framework unifies the Pythagorean means through appropriate monotonic transformations.[63] Quasi-arithmetic means exhibit key properties that make them versatile for averaging. They are internal, meaning min{xi}Mf(x1,,xn)max{xi}\min\{x_i\} \leq M_f(x_1, \dots, x_n) \leq \max\{x_i\}, and idempotent, so Mf(a,,a)=aM_f(a, \dots, a) = a for any a>0a > 0. The power mean arises as a special case by selecting f(x)=xpf(x) = x^p for p0p \neq 0.[63][64] In economics, quasi-arithmetic means facilitate custom averaging in utility theory and risk analysis, where the function ff can reflect nonlinear preferences or downward risk measures. For instance, they underpin combined indices of utility and weighting functions to evaluate economic aggregates. In data science, modern extensions employ quasi-arithmetic means for non-linear aggregation in tasks like entropy estimation and information geometry, enabling flexible Bregman-based divergences for machine learning models. These applications highlight their role beyond linear statistics, supporting robust handling of skewed or transformed datasets.[65][66][67]

Robust and Truncated Means

Truncated Mean

The truncated mean, also known as the trimmed mean, is a robust estimator of central tendency computed by first sorting a dataset in ascending order and then excluding a predetermined proportion α\alpha of the highest and lowest values from each tail before calculating the arithmetic mean of the remaining observations.[68] For instance, a 5% trimmed mean removes the bottom 5% and top 5% of the sorted data, reducing the influence of potential outliers while preserving more information than the median.[69] This approach addresses the sensitivity of the standard arithmetic mean to extreme values, making it particularly useful in datasets prone to contamination or heavy tails.[70] A key variant is the Winsorized mean, which instead of discarding extreme values, replaces the lowest α\alpha proportion with the smallest retained value and the highest α\alpha proportion with the largest retained value, then computes the mean of this adjusted dataset.[71] This method retains all observations, avoiding the loss of data points associated with trimming, and is similarly robust to outliers by capping their impact.[72] The interquartile mean represents a specific instance of the 25% trimmed mean, focusing on the central 50% of the data.[68] Truncated and Winsorized means exhibit lower sensitivity to outliers compared to the arithmetic mean, with breakdown points up to α\alpha for trimming, allowing robustness against up to that fraction of corrupted data.[70] However, in small samples, these estimators can introduce bias, as the removal or capping of extremes may systematically shift the estimate away from the true population mean, especially if the trimming proportion is large relative to the sample size.[73] Their variance is generally higher than that of the arithmetic mean under normality but decreases with increasing sample size, approaching the efficiency of the mean asymptotically. In applications, truncated means are employed in sports scoring, such as Olympic gymnastics judging, where the highest and lowest scores from a panel are discarded to mitigate bias from extreme opinions.[74] They also appear in signal processing for robust estimation, such as in alpha-trimmed mean filters that suppress noise in images by averaging windowed data after trimming extremes, improving edge preservation over simple averaging.[75] For example, consider a class of 20 exam scores ranging from 45 to 98; a 10% trimmed mean excludes the two lowest (45, 52) and two highest (95, 98) scores, then averages the remaining 16 values (e.g., yielding approximately 78 if the central scores average that figure), providing a fairer assessment less skewed by potential cheating or errors.[76] In the context of big data, efficient algorithms enable O(n computation of trimmed means without full sorting, such as pairwise aggregation methods that facilitate scalable robust estimation in high-volume datasets.

Interquartile Mean

The interquartile mean is defined as the arithmetic mean of the data values lying between the first quartile (Q1) and the third quartile (Q3) in a sorted dataset. This approach discards the lowest 25% and highest 25% of the observations, focusing exclusively on the central 50% to provide a measure of central tendency.[77][78] The formula for the interquartile mean (IQM) of a sorted dataset {x1x2xn}\{x_1 \leq x_2 \leq \dots \leq x_n\} is given by
IQM={iQ1xiQ3}xi#{iQ1xiQ3}, \text{IQM} = \frac{\sum_{\{i \mid Q_1 \leq x_i \leq Q_3\}} x_i}{\# \{i \mid Q_1 \leq x_i \leq Q_3\}},
where the summation is over the indices ii such that xix_i falls between Q1 and Q3, and the denominator is the count of such values.[78] This formulation ensures that approximately half the data contributes to the calculation, with the exact number depending on the dataset size and quartile positions. As a type of truncated mean, the interquartile mean exhibits strong resistance to outliers, remaining stable even if up to 25% of the data are contaminated by extreme values. It is particularly valued in descriptive statistics for summarizing datasets with skewness or anomalies, where the full arithmetic mean might be misleading.[77][68] In applications, the interquartile mean is employed in robust statistical analysis of environmental data, such as air quality measurements that often include extreme pollution spikes due to unusual events. It helps provide a reliable central estimate without distortion from these outliers. For instance, consider the dataset {1,2,3,10,11,100}\{1, 2, 3, 10, 11, 100\}; here, Q1 = 2 and Q3 = 11, so the interquartile mean [is (2](/page/IS2)+3+10+11)/4=[6.5](/page/6.5)(2](/page/IS-2) + 3 + 10 + 11)/4 = [6.5](/page/6.5).[78] Compared to the midrange, which averages only the minimum and maximum values and is highly sensitive to extremes, the interquartile mean offers greater robustness by incorporating more central data points.[77]

Means for Special Data Types

Mean of Angles and Circular Quantities

When averaging angles or other circular quantities, the standard arithmetic mean can produce misleading results due to the periodic nature of the data, where values wrap around at 360° (or 2π radians). For instance, the angles 1° and 359° have an arithmetic mean of 180°, which incorrectly suggests a direction opposite to the clustered values near 0°; this occurs because the arithmetic mean treats the circle as a linear interval, ignoring its topology.[79] To address this, the circular mean (or mean direction) is used, defined as the angle corresponding to the resultant vector from unit vectors representing each data point. For a set of angles θi\theta_i (in radians, i=1,,ni = 1, \dots, n), the circular mean μ\mu is given by
μ=\atantwo(1ni=1nsinθi,1ni=1ncosθi), \mu = \atantwo\left( \frac{1}{n} \sum_{i=1}^n \sin \theta_i, \frac{1}{n} \sum_{i=1}^n \cos \theta_i \right),
where \atantwo(y,x)\atantwo(y, x) is the two-argument arctangent function that returns the angle in the correct quadrant, ensuring μ\mu lies in (π,π](-\pi, \pi].[79] This mean is defined modulo 2π2\pi, reflecting the circular domain, and relies on vector addition in the plane, where each angle is projected onto the unit circle as (cosθi,sinθi)(\cos \theta_i, \sin \theta_i); the resultant length R=(1ncosθi)2+(1nsinθi)2R = \sqrt{\left( \frac{1}{n} \sum \cos \theta_i \right)^2 + \left( \frac{1}{n} \sum \sin \theta_i \right)^2} quantifies concentration, with R=1R = 1 for perfect alignment and R=0R = 0 for uniform dispersion. Applications include analyzing wind directions in meteorology, where circular means aggregate prevailing flows without boundary artifacts; clock times on a 24-hour cycle, such as averaging event occurrences modulo 24 hours; and phase angles in physics, like synchronizing wave phases in signal processing or quantum mechanics.[80][81] For example, the directions 0°, 10°, and 350° (converted to radians) yield means of cos0.99\cos \approx 0.99 and sin0\sin \approx 0, so μ0\mu \approx 0^\circ, correctly capturing the clustering near north.[79] A key variant is circular variance, measuring dispersion as V=1RV = 1 - R, which ranges from 0 (no spread) to 1 (maximum spread) and complements the mean by assessing data concentration without assuming a linear scale. In recent developments, Bayesian approaches for circular data have emerged in geospatial AI, particularly for spatio-temporal interpolation of directional observations like wind patterns, using hierarchical models such as wrapped distributions or von Mises mixtures to incorporate priors and uncertainty in large-scale environmental datasets.[82]

Fréchet Mean

The Fréchet mean, also known as the Karcher mean or Riemannian barycenter in certain contexts, generalizes the concept of a central tendency to arbitrary metric spaces. For a set of points $ {x_1, \dots, x_n} $ in a metric space $ (M, d) $, it is defined as the point $ y \in M $ that minimizes the expected squared distance to the data points, given by
y=argminzM1ni=1nd(z,xi)2. y = \arg\min_{z \in M} \frac{1}{n} \sum_{i=1}^n d(z, x_i)^2.
[83] This formulation extends the classical notion of averaging beyond vector spaces, where addition may not be defined, by relying solely on the metric structure.[84]
In Euclidean spaces equipped with the standard $ \ell_2 $ metric, the Fréchet mean coincides with the arithmetic mean, as the minimizer of the sum of squared distances is the centroid.[84] This reduction highlights its role as a unifying framework, recovering familiar statistics in familiar settings while enabling extensions to non-Euclidean geometries. On Riemannian manifolds, computing the Fréchet mean typically requires iterative optimization algorithms, such as gradient descent on the manifold or recursive estimation procedures, due to the lack of closed-form solutions.[85] These methods leverage the manifold's tangent spaces and exponential maps to approximate the minimizer, converging under conditions like small data variance or bounded curvature.[86] The Fréchet mean is unique when the objective functional is strictly convex, which holds in convex metric spaces or under sufficient geodesic convexity in the data support; otherwise, multiple local minima may exist in non-convex spaces.[87] For instance, on the unit sphere $ S^2 $ with the great-circle distance metric, the Fréchet mean of scattered points represents their "average orientation," computed by minimizing the sum of squared geodesic distances, often yielding a point near the geometric center of the spherical convex hull.[88] Applications of the Fréchet mean span diverse fields, including shape analysis where it averages landmark configurations on deformation spaces, robotics for pose estimation by aggregating orientations or transformations, and machine learning for initializing clustering centroids in non-Euclidean feature spaces.[89][90][86] In the 2020s, it has gained traction in deep learning for aggregating embeddings on Riemannian manifolds, such as hyperbolic spaces for hierarchical data or symmetric positive definite matrices for covariance representations, enabling end-to-end differentiable averaging in neural architectures.[91] The circular mean emerges as a special case when applied to the circle manifold.[92]

Means in Geometric Contexts

In geometry, the centroid of a triangle represents the arithmetic mean of its vertices' positions. For a triangle with vertices at position vectors A\mathbf{A}, B\mathbf{B}, and C\mathbf{C}, the centroid G\mathbf{G} is given by G=A+B+C3\mathbf{G} = \frac{\mathbf{A} + \mathbf{B} + \mathbf{C}}{3}, serving as the mean position that balances the triangle's mass if uniformly distributed. This concept extends to the mean position in triangular configurations, where the centroid minimizes the sum of squared distances to the vertices, analogous to the arithmetic mean in one dimension. In triangular sets, means can be computed using barycentric coordinates, which express points as weighted averages of the vertices with weights summing to unity. The barycentric mean of points within a triangle weights contributions by their areal coordinates, providing a geometrically intuitive average that preserves affine properties. For instance, the area-weighted mean position in a subdivided triangular mesh averages vertex positions proportional to the areas of adjacent triangles, ensuring balanced representation in irregular triangulations. This approach is foundational in computational geometry for tasks like mesh smoothing. The geometric mean finds application in triangles through the lengths of sides or altitudes. For altitudes, the geometric mean provides a measure of the triangle's "average height," useful in optimization problems where scaling factors are involved, such as in similar triangles. A key example is the geometric mean theorem in right triangles: the altitude to the hypotenuse is the geometric mean of the two segments it divides the hypotenuse into, and each leg is the geometric mean of the hypotenuse and the projection of that leg on the hypotenuse. Applications of these geometric means abound in computer graphics and surveying. In graphics, the centroid and barycentric means enable realistic rendering of triangular meshes by interpolating textures or colors at mean positions, as seen in algorithms for Gouraud shading where vertex attributes are averaged. Surveying leverages the centroid as a mean coordinate for establishing control points in triangular networks, minimizing errors in geodetic computations. In geographic information systems (GIS), computational geometry employs area-weighted triangular means to aggregate spatial data over polygonal regions, facilitating accurate interpolation in terrain modeling and urban planning. These uses underscore the role of means in preserving geometric integrity across scales.

Other Specialized Means

Integral Mean of a Function

The integral mean of a function, also known as the average value of the function over an interval, provides a continuous analog to the discrete arithmetic mean by integrating the function over its domain. For a continuous function f(x)f(x) defined on the closed interval [a,b][a, b], the integral mean is given by
favg=1baabf(x)dx. f_{\text{avg}} = \frac{1}{b - a} \int_a^b f(x) \, dx.
This formula arises from the concept of averaging the function's values uniformly across the interval, where the integral represents the net area under the curve and division by the interval length normalizes it.[20] The Mean Value Theorem for Integrals guarantees that if ff is continuous on [a,b][a, b], there exists some c[a,b]c \in [a, b] such that f(c)=favgf(c) = f_{\text{avg}}, meaning the function attains its average value at least once in the interval. This theorem links the integral mean directly to the function's behavior and is fundamental in proving properties of definite integrals.[20] Variants of the integral mean incorporate weighting to emphasize certain parts of the domain. For a non-negative weight function w(x)w(x) with abw(x)dx=1\int_a^b w(x) \, dx = 1 (acting as a density), the weighted integral mean is
favg,w=abf(x)w(x)dx. f_{\text{avg}, w} = \int_a^b f(x) w(x) \, dx.
If w(x)w(x) is not normalized, divide by abw(x)dx\int_a^b w(x) \, dx. Such weighted forms are essential when the uniform measure is inappropriate, allowing for customized averaging based on relevance across the interval.[93] The integral mean connects to discrete means through Riemann sums, where partitioning the interval [a,b][a, b] into nn subintervals and summing f(xi)Δxf(x_i^*) \Delta x (with Δx=(ba)/n\Delta x = (b - a)/n) approximates the integral; as nn \to \infty, this sum converges to (ba)favg(b - a) f_{\text{avg}}, bridging continuous and discrete averaging./05:_Integration/5.03:_Riemann_Sums) In applications, the integral mean quantifies average behaviors in continuous systems. In physics, it computes quantities like average velocity over time, where for position s(t)s(t), the average velocity is 1T0Tv(t)dt=s(T)s(0)T\frac{1}{T} \int_0^T v(t) \, dt = \frac{s(T) - s(0)}{T}, illustrating conservation principles via the Fundamental Theorem of Calculus.[94] In signal processing, the time average of a deterministic signal s(t)s(t) over duration TT is 1T0Ts(t)dt\frac{1}{T} \int_0^T s(t) \, dt, used to extract DC components or steady-state values from waveforms, such as in rectifier circuits where the average voltage of a sinusoidal input is 2πVp\frac{2}{\pi} V_p.[95] A representative example is the integral mean of f(x)=sinxf(x) = \sin x over [0,π][0, \pi]:
sinavg=1π00πsinxdx=1π[cosx]0π=2π0.637. \sin_{\text{avg}} = \frac{1}{\pi - 0} \int_0^\pi \sin x \, dx = \frac{1}{\pi} [-\cos x]_0^\pi = \frac{2}{\pi} \approx 0.637.
This value, greater than zero despite the function's symmetry around π/2\pi/2, reflects the positive net area in the interval.[96] For a uniform distribution on [a,b][a, b], the integral mean of f(X)f(X) coincides with the expected value E[f(X)]=1baabf(x)dxE[f(X)] = \frac{1}{b - a} \int_a^b f(x) \, dx, highlighting its role as a special case in probability where the density is constant.[46] In modern stochastic processes, the integral mean extends to random functions via stochastic integrals, such as the Itô integral 0Tf(t,ω)dWt\int_0^T f(t, \omega) \, dW_t for a Brownian motion WtW_t, where the mean E[0Tf(t,ω)dWt]=0E\left[\int_0^T f(t, \omega) \, dW_t\right] = 0 due to the martingale property, enabling analysis of random fluctuations in finance and physics.[97]

Swanson's Rule

Swanson's rule provides an approximation for the mean of a positively skewed distribution using a weighted combination of its lower, median, and upper fractiles. It is defined by the formula
μ0.3P10+0.4P50+0.3P90, \mu \approx 0.3 P_{10} + 0.4 P_{50} + 0.3 P_{90},
where P10P_{10} is the 10th percentile (conservative estimate), P50P_{50} is the median, and P90P_{90} is the 90th percentile (optimistic estimate). This weighting scheme, often called the 30-40-30 rule, emphasizes the median while balancing the tails to better capture the expected value in distributions where the arithmetic mean of all data points may be misleading due to skewness. The rule originated in 1972 from an internal memorandum by Roy Swanson, a geologist at Exxon, who developed it to estimate the mean size of oil fields from probabilistic assessments of reserves. Swanson aimed to create a practical heuristic for resource evaluation when full distributional data were unavailable, drawing on empirical observations of field size distributions that often follow log-normal patterns. It gained wider adoption in the geosciences community through subsequent publications and has since become a standard tool in risk analysis for uncertain quantities.[98][99] In applications, Swanson's rule is particularly valuable in petroleum engineering and geostatistics for aggregating probabilistic forecasts, such as expected recoverable oil volumes or basin-wide resource potentials. For instance, in assessing an oil field, if the low-case estimate (P10P_{10}) is 50 million barrels, the median (P50P_{50}) is 100 million barrels, and the high-case (P90P_{90}) is 300 million barrels, the approximated mean is 0.3×50+0.4×100+0.3×300=1400.3 \times 50 + 0.4 \times 100 + 0.3 \times 300 = 140 million barrels, providing a balanced expectation for economic planning. This approach is favored in scenarios involving log-normal or modestly skewed data, where it outperforms the simple arithmetic average by reducing bias from extreme values.[99] As a simple, ad-hoc method, Swanson's rule offers robustness without requiring computational simulations like Monte Carlo methods, making it accessible for quick assessments in field evaluations or regulatory reporting. Studies have validated its accuracy for log-normal distributions, where it closely reproduces the true mean, though it may underperform for highly skewed or multi-modal cases, prompting alternatives like full probabilistic modeling in modern contexts.[100]

References

Table of Contents