Skip to main content
added 278 characters in body
Source Link
Stéphane Chazelas
  • 584.5k
  • 96
  • 1.1k
  • 1.7k
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -Cd "[:${class}:]" < "$file" | wc -m)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -Cd "_[:alnum:]" < "$file" | wc -m)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"

(note that the GNU implementation of tr, as of coreutils-8.22, won't work with multi-byte characters).

On systems using the GNU libc at least, you can also run:

$ locale ctype-class-names
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3

To find out the list of character classes that are defined in your locale.

file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -Cd "[:${class}:]" < "$file" | wc -m)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -Cd "_[:alnum:]" < "$file" | wc -m)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"

(note that the GNU implementation of tr, as of coreutils-8.22, won't work with multi-byte characters).

file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -Cd "[:${class}:]" < "$file" | wc -m)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -Cd "_[:alnum:]" < "$file" | wc -m)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"

(note that the GNU implementation of tr, as of coreutils-8.22, won't work with multi-byte characters).

On systems using the GNU libc at least, you can also run:

$ locale ctype-class-names
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3

To find out the list of character classes that are defined in your locale.

added 104 characters in body
Source Link
Stéphane Chazelas
  • 584.5k
  • 96
  • 1.1k
  • 1.7k
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -cdCd "[:${class}:]" < "$file" | wc -cm)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -cdCd "_[:alnum:]" < "$file" | wc -cm)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"

(note that the GNU implementation of tr, as of coreutils-8.22, won't work with multi-byte characters).

file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -cd "[:${class}:]" < "$file" | wc -c)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -cd "_[:alnum:]" < "$file" | wc -c)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -Cd "[:${class}:]" < "$file" | wc -m)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -Cd "_[:alnum:]" < "$file" | wc -m)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"

(note that the GNU implementation of tr, as of coreutils-8.22, won't work with multi-byte characters).

fixed "word" as per manatwork comment
Source Link
Stéphane Chazelas
  • 584.5k
  • 96
  • 1.1k
  • 1.7k
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -cd "[:${class}:]" < "$file" | wc -c)"
done

ascii and word are not standard character classes and are bash specific. word is synonymous of alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -cd "["_[:alnum:]" < "$file" | wc -c)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -cd "[:${class}:]" < "$file" | wc -c)"
done

ascii and word are not standard character classes and are bash specific. word is synonymous of alnum, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -cd "[:alnum:]" < "$file" | wc -c)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"
file=myfile
for class in alnum alpha blank cntrl digit graph lower print punct space upper xdigit
do
  printf '%7s: %d\n' "$class" "$(tr -cd "[:${class}:]" < "$file" | wc -c)"
done

ascii and word are not standard character classes and are bash specific. word is alnum plus underscore, and ascii is characters 0 to 127, so you can do:

printf '%7s: %d\n' word "$(tr -cd "_[:alnum:]" < "$file" | wc -c)"
printf '%7s: %d\n' ascii "$(LC_ALL=C tr -cd '\0-\177' < "$file" | wc -c)"
Source Link
Stéphane Chazelas
  • 584.5k
  • 96
  • 1.1k
  • 1.7k
Loading