I'm not sure if this can affect the performance (if you have very larges files I would think this command should be slow):
grep -Fh '*' | tr -s ' ' | sort | uniq -c
More portable:
grep -Fh '*' * 2>/dev/null | tr -s ' ' | sort | uniq -c
And if you have sub-directories with more files you want to search inside:
grep -Fh '*' **/* 2>/dev/null | tr -s ' ' | sort | uniq -c | sed 's/.$//'
Or to avoid using 2>/dev/null:
find . -type f -exec grep -Fh '*' {} + | tr -s ' ' | sort | uniq -c | sed 's/.$//'
The section grep -Fh '*' means that will match any line which has a * at the end of this one. -h suppress printing the filenames whose matches the pattern and -F is for using literal strings (the '*' behaves as a string and not as pattern).
About tr -s ' ' I'm removing repeated spaces between every line, for example having this:
Need *
Word buzz *
Need *
More *
More *
Word *
More *
More *
Word *
Word *
Need *
More *
the tr command will parse it to:
Need *
Word buzz *
Need *
More *
More *
Word *
More *
More *
Word *
Word *
Need *
More *
The content above is piped to sort to have this output:
More *
More *
More *
More *
More *
Need *
Need *
Need *
Word *
Word *
Word *
Word buzz *
And finally with uniq -c I'm prefixing lines by the number of occurrences of every word which is what you want.
The sort command is important, if you do not use it, the expected result will be different
According to the output above, the final output (by using uniq -c) will be:
5 More *
3 Need *
3 Word *
1 Word buzz *
If you want to remove the * you can pipe to sed to remove the last character or *:
grep -Fh '*' * | tr -s ' ' | sort | uniq -c | sed 's/.$//'
#or
grep -Fh '*' * | tr -s ' ' | sort | uniq -c | sed 's/\*//'
I think and hope there are better ways to achieve that, because here I'm using several commands to get the desired output. So as I said it may result in slow performance.
Wood * moretext? should be consideredWoodas another occurrence? Or what aboutWood buzz *?Wood buzzshould be considered as another occurrence too>*) then provide sample input/output that includes some lines that do and some that don't have that string/character. We can't test a potential solution using the example you provided where every string from the input appears in the output, there's only 1 occurrence of each string you want counted, and the output counts don't come from the input.