Skip to main content
replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/
Source Link

Why does grep output lines that seemingly don't match the expression?

As mentioned in my commentmy comment this behaviour may be caused by a bug.

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this questionthis question after I saw locales affected the output.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$

Why does grep output lines that seemingly don't match the expression?

As mentioned in my comment this behaviour may be caused by a bug.

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this question after I saw locales affected the output.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$

Why does grep output lines that seemingly don't match the expression?

As mentioned in my comment this behaviour may be caused by a bug.

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this question after I saw locales affected the output.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$
Mod Moved Comments To Chat
added 196 characters in body
Source Link
brendan
  • 195
  • 9

Why does grep output lines that seemingly don't match the expression?

As mentioned in - Locale issuemy comment this behaviour may be caused by a bug.

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this question after I saw locales wereaffected the problemoutput.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$

Why does grep output lines that seemingly don't match the expression? - Locale issue

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this question after I saw locales were the problem.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$

Why does grep output lines that seemingly don't match the expression?

As mentioned in my comment this behaviour may be caused by a bug.

I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.

I had this question after I saw locales affected the output.

[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee

[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$





[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee

[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.    
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574  grep [A-Z] input
0000010: 2e74 7874 0a                             .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065  aa bb.CC cc.dd e
0000010: 650a 0a                                  e..
[aa@bb grep-test]$
deleted 58 characters in body
Source Link
brendan
  • 195
  • 9

Original question: Why does grep output lines that seemingly don't match the expression? - Locale issue

Why is output forI had grep -othis question after I saw locales were the same with or without LC_ALL=C? There is a difference for grep with no flags really as expected but there's no difference for grep -oproblem. Does grep -o always use LC_ALL=C or something else?

Original question: Why does grep output lines that seemingly don't match the expression?

Why is output for grep -o the same with or without LC_ALL=C? There is a difference for grep with no flags really as expected but there's no difference for grep -o. Does grep -o always use LC_ALL=C or something else?

Why does grep output lines that seemingly don't match the expression? - Locale issue

I had this question after I saw locales were the problem.

added 111 characters in body
Source Link
brendan
  • 195
  • 9
Loading
added 266 characters in body
Source Link
brendan
  • 195
  • 9
Loading
added 266 characters in body
Source Link
brendan
  • 195
  • 9
Loading
added 159 characters in body
Source Link
brendan
  • 195
  • 9
Loading
added 168 characters in body
Source Link
brendan
  • 195
  • 9
Loading
added 40 characters in body
Source Link
brendan
  • 195
  • 9
Loading
add output for more commands
Source Link
brendan
  • 195
  • 9
Loading
Source Link
brendan
  • 195
  • 9
Loading