22

I have a data file that contains 6500 rows, and 2 columns:

1ES9 0.927536231884058 
1ET1 1.0 
1EU1 0.8915343915343915
... ... ...

I want to count the occurrences of 1.0 in the file.

I have used the following grep command and the output was 1001:

grep -o '1.0' data_file.txt | wc -l

Then, I executed Notepad++'s Find->Count tool under windows 10. It gave 144.

Why is that different with grep?

1
  • choose "Search Mode" in Notepad++ as "Regular expression" and see Commented Mar 4, 2022 at 14:58

1 Answer 1

60

grep uses regular expressions by default, and “1.0” is a regular expression matching “1” followed by any character followed by “0”. In your example, the line

1EU1 0.8915343915343915

would produce a match for “1 0”.

To accurately count “1.0” occurrences, you should ask grep to search for fixed strings:

grep -Fo 1.0 data_file.txt | wc -l

or “escape” the period so it matches a period:

grep -o '1\.0' data_file.txt | wc -l

If you want to only count 1.0 as values, and not substrings (e.g. in “11.002”), you should ask grep to only match words:

grep -wo '1\.0' data_file.txt | wc -l

You don’t need to involve wc either, since you’re only interested in one match per line, and grep can count lines:

grep -cw '1\.0' data_file.txt

This will still match “-1.0”, since “-” is a non-word character; if that’s a problem, you can extend the pattern and stop looking for words:

grep -c ' 1\.0$' data_file.txt

or use a tool such as AWK to match the numerical value:

awk '$2+0 == 1 { c++ } END { print c }' data_file.txt

(adding 0 forces $2 to be interpreted as a number).

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.