Return to Answer

added 277 characters in body

Source Link

edited Aug 24, 2020 at 11:20

584.8k
96
1.1k
1.7k

To count the number of newline characters (-l), words (which-w, which for wc are sequences of non-whitespace characters, so words are delimited by either whitespace or non-characters), characters (-m) and bytes (-c), you can do:

find . -type f -exec cat {} + | wc -lwmc

However note that because cat concatenates the files, it could give incorrect results for the word and character count if there are files that don't end in a whitespace character (text files should end in a newline character, which is a whitespace character) as that could end-up joining two bytes into onone valid characters for instance, or joinjoining two words together.

Example:

$ od -tx1 a
0000000 c3
0000001
$ od -tx1 b
0000000 a9
0000001
$ wc -m a b
0 a
0 b
0 total
$ cat a b | wc -m
1

$ printf foo > a
$ printf bar > b
$ wc -w a b
1 a
1 b
2 total
$ cat a b | wc -w
1

To count the whitespace characters, POSIXly, you could do:

find . -type f -exec cat {} + | tr -cd '[:space:]' | wc -m

(with the same caveat about joining bytes into characters), but note that with GNU tr, that only works for single-byte characters (so not UTF-8) encoded non-ASCII characters for instance).

On GNU systems, you could resort to GNU grep and use:

grep -rzo '[[:space:]]' . | LC_ALL=C tr -cd '\0' | wc -c

Though note that because with -z, grep works on NUL-delimited records, that would end up slurping whole text files in memory (as text files typically don't contain NUL bytes).

To count the number of newline characters, words (which for wc are sequences of non-whitespace characters, so words are delimited by either whitespace or non-characters), characters and bytes, you can do:

find . -type f -exec cat {} + | wc -lwmc

However note that because cat concatenates the files, it could give incorrect results for the word and character count if there are files that don't end in a whitespace character (text files should end in a newline character, which is a whitespace character) as that could end-up joining two bytes into on valid characters for instance, or join two words together.

To count the whitespace characters, POSIXly, you could do:

find . -type f -exec cat {} + | tr -cd '[:space:]' | wc -m

(with the same caveat about joining bytes into characters), but note that with GNU tr, that only works for single-byte characters (so not UTF-8) characters.

On GNU systems, you could resort to GNU grep and use:

grep -rzo '[[:space:]]' . | LC_ALL=C tr -cd '\0' | wc -c

Though note that because with -z, grep works on NUL-delimited records, that would end up slurping whole text files in memory (as text files typically don't contain NUL bytes).

To count the number of newline characters (-l), words (-w, which for wc are sequences of non-whitespace characters, so words are delimited by either whitespace or non-characters), characters (-m) and bytes (-c), you can do:

find . -type f -exec cat {} + | wc -lwmc

However note that because cat concatenates the files, it could give incorrect results for the word and character count if there are files that don't end in a whitespace character (text files should end in a newline character, which is a whitespace character) as that could end-up joining two bytes into one valid characters for instance, or joining two words together.

Example:

$ od -tx1 a
0000000 c3
0000001
$ od -tx1 b
0000000 a9
0000001
$ wc -m a b
0 a
0 b
0 total
$ cat a b | wc -m
1

$ printf foo > a
$ printf bar > b
$ wc -w a b
1 a
1 b
2 total
$ cat a b | wc -w
1

To count the whitespace characters, POSIXly, you could do:

find . -type f -exec cat {} + | tr -cd '[:space:]' | wc -m

(with the same caveat about joining bytes into characters), but note that with GNU tr, that only works for single-byte characters (so not UTF-8 encoded non-ASCII characters for instance).

On GNU systems, you could resort to GNU grep and use:

grep -rzo '[[:space:]]' . | LC_ALL=C tr -cd '\0' | wc -c

Though note that because with -z, grep works on NUL-delimited records, that would end up slurping whole text files in memory (as text files typically don't contain NUL bytes).

Source Link

answered Aug 24, 2020 at 9:48

Stéphane Chazelas

584.8k
96
1.1k
1.7k

find . -type f -exec cat {} + | wc -lwmc

However note that because cat concatenates the files, it could give incorrect results for the word and character count if there are files that don't end in a whitespace character (text files should end in a newline character, which is a whitespace character) as that could end-up joining two bytes into on valid characters for instance, or join two words together.

To count the whitespace characters, POSIXly, you could do:

find . -type f -exec cat {} + | tr -cd '[:space:]' | wc -m

(with the same caveat about joining bytes into characters), but note that with GNU tr, that only works for single-byte characters (so not UTF-8) characters.

On GNU systems, you could resort to GNU grep and use:

grep -rzo '[[:space:]]' . | LC_ALL=C tr -cd '\0' | wc -c

Though note that because with -z, grep works on NUL-delimited records, that would end up slurping whole text files in memory (as text files typically don't contain NUL bytes).