Why do grep and Notepad++ produce different results?

Question

I have a data file that contains 6500 rows, and 2 columns:

1ES9 0.927536231884058 
1ET1 1.0 
1EU1 0.8915343915343915
... ... ...

I want to count the occurrences of 1.0 in the file.

I have used the following grep command and the output was 1001:

grep -o '1.0' data_file.txt | wc -l

Then, I executed Notepad++'s Find->Count tool under windows 10. It gave 144.

Why is that different with grep?

choose "Search Mode" in Notepad++ as "Regular expression" and see — phuclv
– phuclv, Commented Mar 4, 2022 at 14:58

Stephen Kitt · Accepted Answer · 2022-03-04 12:41:43Z

grep uses regular expressions by default, and “1.0” is a regular expression matching “1” followed by any character followed by “0”. In your example, the line

1EU1 0.8915343915343915

would produce a match for “1 0”.

To accurately count “1.0” occurrences, you should ask grep to search for fixed strings:

grep -Fo 1.0 data_file.txt | wc -l

or “escape” the period so it matches a period:

grep -o '1\.0' data_file.txt | wc -l

If you want to only count 1.0 as values, and not substrings (e.g. in “11.002”), you should ask grep to only match words:

grep -wo '1\.0' data_file.txt | wc -l

You don’t need to involve wc either, since you’re only interested in one match per line, and grep can count lines:

grep -cw '1\.0' data_file.txt

This will still match “-1.0”, since “-” is a non-word character; if that’s a problem, you can extend the pattern and stop looking for words:

grep -c ' 1\.0$' data_file.txt

or use a tool such as AWK to match the numerical value:

awk '$2+0 == 1 { c++ } END { print c }' data_file.txt

(adding 0 forces $2 to be interpreted as a number).

Stack Exchange Network

Why do grep and Notepad++ produce different results?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Why do grep and Notepad++ produce different results?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions