11
$\begingroup$

I was quite surprised to learn that there are different ways to measure skewness. For example, the Galton skewness, Pearson 2 skewness coefficient, and so on. At the moment, I am interpreting the adjusted Fisher-Pearson coefficient of skewness, and I realized that it is based on the Z-score of the data points. As mentioned in [1b] here, it is defined as:

$$G_1 = {N \over (N-1)(N-2)} \sum_{n=1}^N\Bigg({x_n-\overline{x}\over s}\Bigg)^3 \tag 1$$

where $G_1$ represents the adjusted Fisher-Pearson coefficient of skewness, $N$ is the number of samples, $\overline{x}$ is the average value of our data set, $x_n$ is the $n$-th sample and $s$ is the sample standard deviation. Keeping in mind that the Z-score of a sample is defined as $z_n=(x_n - \overline{x})/s$, the above expression can be rewritten as:

$$G_1 = {N^2 \over (N-1)(N-2)} \cdot {1 \over N}\sum_{n=1}^Nz_n^3 \tag 2$$

Using L'Hospital's rule twice, it can be shown that the bias adjustment coefficient tends to one as $N$ tends to infinity:

$$ \lim_{N \to \infty}{N^2 \over (N-1)(N-2)} = \lim_{N \to \infty}{N^2 \over N^2-3N+2}=\lim_{N \to \infty}{2N \over 2N-3}=1 \tag 3$$

So if $N$ is large enough, we get the approximation:

$$G_1 \approx \overline{z^3} \tag 4$$

The above approximation is not that hard to understand. In general, we know that if $z_n < 1$ then $z_n > z_n^3$. Also, if $z_n > 1$ then $z_n < z_n^3$. So the skewness will be quite small as long as the Z-score is below 1. After that, it grows significantly.

This interpretation got me thinking, why do we need the average value of the cubed Z-scores to measure skewness? Can't we simply use the average value of the Z-score as a measure of skewness? For example, if we define a measure of skewness as:

$$S = {1\over N}\sum_{n=1}^N z_n \tag 5$$

We can still easily interpret this value. It says that on average our data points deviate from the average value by $S$ standard deviations. Since the value is not zero, the data is skewed. My question is, does this statement make sense? If yes, why is it a worse measure of skewness than the adjusted Fisher-Pearson coefficient of skewness?

$\endgroup$
2
  • 8
    $\begingroup$ The sum (or average) of the z-scores is 0, because z_n = (x_n-xbar)/n, and sum{1:n}(x_n) = sum{1:n}(xbar) = n*xbar $\endgroup$ Commented Oct 27 at 1:06
  • 2
    $\begingroup$ The so-called Galton skewness seems to have been introduced by Bowley in 1902. The main idea of moment-based skewness was introduced by Thiele, which Pearson shamefully tried to downplay. Much more in Section 7 of journals.sagepub.com/doi/pdf/10.1177/1536867X211063415 -- which in turn knowingly did not cite L-moments. $\endgroup$ Commented Oct 27 at 13:53

1 Answer 1

17
$\begingroup$

Making my comment into an answer, using OP's notation, so the question can be resolved:

The sum (and thus also the average) of the z-scores will always be 0. So it cannot be an informative measure of skewness since it is 0 regardless of the dataset.

Since $z_n=(x_n-\bar x)/s$, we have that

$$\sum_{n=1}^N z_n = (1/s) \left(\sum_{n=1}^N x_n - \sum_{n=1}^N \bar x\right) = (1/s) (N\bar x - N\bar x) = 0$$

$\endgroup$
2
  • $\begingroup$ Would you really say "by definition"? Althought the proof of the proposition is trivial, that's not the same as calling it a definition. $\endgroup$ Commented Oct 31 at 1:24
  • 1
    $\begingroup$ I probably would have omitted that phrase entirely. $\endgroup$ Commented Oct 31 at 5:19

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.