(If the score of this question is 72, please don't upvote!)
I ran this:
cat /usr/bin/* |
perl -ne 'map {$a{$_}++} split//; END{print map { "$a{$_}\t$_\n" } keys %a}' |
grep --text . | sort -n | plotpipe --log y {1}
and got this:
(Even with a log y-axis it still looks exponential! There is more than 100x between the top and the bottom)
Looking at the numbers:
:
31919597 ^H
32983719 ^B
33943030 ^O
39130281 \213
39893389 $
52237360 \211
53229196 ^A
76884442 \377
100776756 H
746405320 ^@
It is hardly surprising that ^@ (NUL) is the most common byte in executables. \377 (255) and ^A (1) also make intuitively sense to me.
But what causes 'H' (72) to be the second most common byte in executables - far more common than 255 and 1?
Background
For a Perl script, I needed to find the least common byte in Perl scripts. By accident, I didn't grep out only Perl scripts but ran the command on all binaries. I expected a few bytes to stand out, such as NUL, 1, and 255, but never 'H'.
The input for the graph is the count of each byte, sorted. The y-axis represents the count, and the x-axis represents the line number (1-256, as a byte can only take on 256 different values). The y-axis is log scale, so the difference is bigger than exponential.
