1

Is there a way to get grep to display the newlines in a file in a human-readable way, for example, in the way vim displays end-of-line characters as $ with :set list?

I'm trying to describe how dot (.) works in a regular expression. As an initial illustration, I search for a pattern of only dot, e.g., grep --color=auto '.' HBB.fna (image). With the --color option, every character in the file is displayed in the match color in the output. However, I would like to explicitly display the end-of-line characters to show they are not matched. Because I am talking about grep, I don't want to use anything other than it.

Example output is in the attached. Again, what I'm going for is for end-of-line characters to appear at the end of every line, not in the match color.

Any help would be appreciated.

enter image description here

9
  • maybe the example you presented has no EOL characters Commented Jul 22, 2022 at 16:54
  • @jsotola, and how would grep decide how to print the lines then Commented Jul 22, 2022 at 16:57
  • 1
    @mark, I don't think you can do that with grep. It implicitly considers only the part of the line before the newline, and doesn't try to match the regex pattern against it. So it never matches, there's no question about it, so no reason to show it. (And, well, grep would print the trailing newline even if it isn't there, e.g. printf 'foo' |grep . prints foo<newline> in all systems I have.) Commented Jul 22, 2022 at 17:01
  • Won't it be simple enough to pipe the output of grep to cat -A? Commented Jul 22, 2022 at 17:04
  • @unxnut And then pipe to sed 's/\$$/\\n/' di differentiate a EOL $ from other instances. Commented Jul 22, 2022 at 17:09

2 Answers 2

1

I thought about using cat -A to postprocess the printout from grep, it'd add the $ to mark the end of line, but would also break down the escape codes for colors.

But, at least the GNU coreutils cat has cat -E, which only marks line ends, so you'd get e.g.

$ printf 'foo\nbar\n' | grep --color=always . | cat -E
foo$
bar$

with the $ signs not colored.

Or do it manually with Perl, this would replace the newline characters with <NL>:

$ printf 'foo\nbar\n' | grep --color=always . | perl -pe 's/\n/<NL>/'; echo
foo<NL>bar<NL>

similarly, the <NL> parts come without coloring.

With grep -z, the newlines would be colored, showing that the . does match the newline, at least in NUL-separated mode.

The same in color:

enter image description here

(With grep . as above, GNU grep prints the color-changing escapes before and after each individual character, i.e. at the start and end of each match instance. You could change to e.g. grep '..*' to match longer sequences in one go and get fewer escapes within the output.)

0

It would be easier for everyone if the text used were posted, instead of an image. To help everyone, here is the file (found parts of somewhere in the internet):

$ cat HBB.fna
>NM_000518.5 Homo sapiens hemoglobin subunit beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGCAA

Then, your question:

I'm trying to describe how dot (.) works in a regular expression.

There is no simple way to match a newline in grep with a dot (.). It can be hinted by what we see:

enter image description here

The matched characters have to be a multiple of 3 dots, that is 69 and that leave just 1 that doesn't match the dots. That's why there is a non-colored last character in most of the lines.

But even if we were to use 71 dots (because 71 is prime there is no other number to match it). That's the 70 characters we see in each line plus an ending newline.

enter image description here

Because the . dot can not match a newline. it is removed before each line is processed and reattached after the line has been processed. There is no newline to match in any case.

Even if we were to use the non-standard -z option (which process the whole text input as one continuous block and therefore allows newlines to remain in the matched text, the newline turns out to be a non-printing character, much like an space or a tab or some other whitespace, the terminal can not print them. enter image description here

So, we need to transform the newlines (matched or not, but present on the output) to something visible (lets use a =, similar to the $ encoding of vi or sed -n l) and lets add an additional newline so that lines don't collapse into a continuous (without format) stream of characters. That is easy to do with sed -z 's/\n/=\n' or some other similar editor: enter image description here

So, There is no way to match a newline with a dot (.) in grep, not even in default pcre. But we can make the newlines appear and be seen by intelligently editing the grep output.

Hope that solves your need.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.