online-stats
Scope
Naive implementation of a few online statistical algorithms.
The idea behind this implementation is to be used as a tool for stateful streaming computations.
Algos covered so far:
- 4 statistical moments and the derived features:
- average
- variance and standard deviation
- skewness
- kurtosis
- covariance
- exponentially weighted moving averages and variance
Algos to be researched:
- exponentially weighted moving skewness
- exponentially weighted moving kurtosis
Using a more formal and mature library like Apache Commons Math is probably a better idea for production applications, but this is also tested against it.
Description
The main concepts introduced in this library are the Stats, EWeightedStats (exponentially
weighted stats), VectorStats and Covariance. Each of them can be composed using either the
append or the |+| functions.
For example, if we have a sequence of numbers, we can compute the statistics like this:
val xs1 = Seq(1.0, 3.0)
val stats1: Stats = xs1.foldLeft(Stats.Nil)((s, x) => s |+| x)
val xs2 = Seq(5.0, 7.0)
val stats2: Stats = xs2.foldLeft(Stats.Nil)((s, x) => s |+| x)
val totalStats = stats1 |+| stats2
val newStats = totalStats |+| 4.0The Stats type with the |+| operation also form a monoid, since |+| has an identity
(unit) element, Stats.Nil, and it is associative.
Also the |+| operation is also commutative, which makes appealing for distributed computing
as well.
Same goes for VectorStats and Covariance.
EWeightedStats is an exception for now, as two EWeightedStats instances can not be composed.
However, the |+| works between an EWeightedStats instance and a double.
Complexity
| Feature | Space Complexity (O) | Time Complexity (O) |
|---|---|---|
| Count, Sum, Min, Max | O(1) (1 * MU) | O(1) |
| Average | O(1) (2 * MU) | O(1) |
| Variance, Standard deviation | O(1) (3 * MU) | O(1) |
| Skewness | O(1) (4 * MU) | O(1) |
| Kurtosis | O(1) (5 * MU) | O(1) |
| Exponentially weighted average | O(1) (2 * MU) | O(1) |
| Exponentially weighted variance | O(1) (2 * MU) | O(1) |
MU: Memory Unit, e.g. Int: 4 bytes, Double 8: bytes
Demos and Examples
The streaming-anomalies-demos project was created to explore and demonstrate some basic use cases for the online-stats library.
References
- "Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments" by Philippe Pebay
- "The Exponentially Weighted Moving Variance" by J. F. Macgregor and T. J. Harris
- "Incremental calculation of weighted mean and variance" by Tony Finch, February 2009
- Bessel Correction
- Skewness
- Kurtosis
- Pearson's Correlation Coefficient

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
