1

I have a file.txt containing:

this is the first
second line
not last line

fourth but first
second in list
seventh in file
seventh with nl

Normally I would just cat and pipe | it into nl like so:

$> cat file.txt | nl
1  this is the first
2  second line
3  not last line

4  fourth but first
5  second in list
6  seventh in file
7  seventh with nl 

But I need the line numbers to reset when it encounters an empty line like so:

$> alias_or_function file.txt
1  this is the first
2  second line
3  not last line

1  fourth but first
2  second in list
3  seventh in file
4  seventh with nl 

How could I do this using a quick function or alias in my ~/.zshrc?

1
  • 2
    A perl script can read the file in paragraph-at-a-time mode, onto an array, which you could print out with numbers, beginning at 1 with each "paragraph". There is a learning curve, but perl is worth learning. Commented Aug 22, 2020 at 5:41

4 Answers 4

4

You could replace blank lines with \:\: which nl understands as the start of a new page body:

<your-file sed 's/^[[:space:]]*$/\\:\\:/' | nl

So as a function:

number-lines-of-paragraphs() {
  sed -e 's/^[[:space:]]*$/\\:\\:/' -- "$@" | nl
}

(note that nl will understand \:, \:\:, \:\:\: as header/body/footer delimiters if they occur in the input as well, which is why you generally can't use nl to add line number to arbitrary text).

You could also get the same output format without those caveats with awk as:

awk 'NF {printf "%6u\t%s\n", FNR, $0; next}; {FNR = 0; print}'

Or some of the variants posted by others here.

Above, the numbers are left padded to 6 characters and followed by a TAB character like in the default nl output format (where %6u\t%s\n is the equivalent of nl's default -s $'\t' -n rn -w 6), but you can of course adjust that format to your liking.

But now, to make it a function that takes arbitrary file names as arguments, that's where you run into awk's own caveats, namely that it chokes on filenames that contain = characters as those are interpreted as awk variable assignment (at least if what's on the left of the first = looks like a valid awk variable name). That can be worked around with gawk as:

number-lines-of-paragraphs() {
  gawk -e '
    NF {printf "%6u\t%s\n", FNR, $0; next}
    {FNR = 0; print}' -E /dev/null "$@"
}

Note that if that function is passed several files, the line numbers will be reset at the start of each file. If you'd rather the contents of all files be taken as a single stream to be numbered as a whole like in the sed | nl approach, replace FNR with NR above.

In any case, both sed and gawk will understand - as meaning stdin, not the file called - in the current directory (use ./- to work around it).

2

If you are willing to use awk:

$ cat nl.awk
{
   if ( $0 == "" ) {
      count = 0
      print
   } else
      print ++count, $0
}

Outputs:

$ awk -f nl.awk infile
1 this is the first
2 second line
3 not last line

1 fourth but first
2 second in list
3 seventh in file
4 seventh with nl
3
  • I'm a bit confused. I've never used print in Unix before. I also don't know how to use awk but I will definitely be reading up on that because I'm learning now that awk has it's own syntax (I'm guessing). Commented Aug 22, 2020 at 12:46
  • 1
    @ntruter42, that's a awk script, not a shell script. So it's not a print command, but the print function in the awk language. Having said that, several shells including ksh and zsh have a print builtin. bash is an odd one out here, given that it has copied most things from ksh including many of its misdesigns, but not that most basic of builtins, the one to print text. sed also has a print command, abbreviated as p. Commented Aug 22, 2020 at 13:51
  • 2
    @ntruter42 awk has it's own builtin print and printf functions. Yes, awk has it's own syntax. It is basically pattern { action } Commented Aug 22, 2020 at 13:52
2

Using awk:

awk '{ c=NF?++c:"" } {print c,$0}' file

It means:

  • If there is any field NF? (any (non-space) character), increment c with ++c.
  • If there are no fields (no characters), make the line counter empty.
  • Print the counter followed by the actual line print c,$0

Sadly this short solution converts empty lines to lines that contain an space (or, actually, to the value of OFS). If that is a problem, then use this (similar) solution:

awk 'NF{$0=++c" "$0}!NF{c=0}1' file

There is no reason to change empty lines to \:\: in this solution.

0
0

Using Raku (formerly known as Perl_6)

~$ raku -ne 'state $i; .chars ?? put(++$i, "\t$_") !! (put ""; $i=0);'  file

In a comment @waltinator suggests using Perl, so here's an answer written in Raku (which is in the Perl-family). The oneliner above can be cut/pasted onto the command line. To start we call Raku with the awk-like -ne (non-autoprinting, linewise) flags. A counter variable $i is stated, which means it only gets initialized once at the start of the program.

Raku's ternary operator Test ?? True !! False is used to test a line for .chars. You could write .chars.Bool or .chars.so, but .chars alone works. If True we output the line with an ++$i incremented counter. If False we output empty string and $i=0 reset the counter to zero.

Sample Input:

this is the first
second line
not last line

fourth but first
second in list
seventh in file
seventh with nl

Sample Output:

1   this is the first
2   second line
3   not last line

1   fourth but first
2   second in list
3   seventh in file
4   seventh with nl

To run this as a standalone program you use the familiar #! shebang line, and call the iterator for lines(), which substitutes for the command-line flags in the one-liner. You can still keep the state $i; statement within the block, or move it outside and declare my $i; instead (see below):

#!/opt/local/bin/raku 

my $i; for lines() {
    .chars ?? put(++$i, "\t$_") !! (put ""; $i=0);
};

You save/run this program similar to the awk answer posted:*

~$ raku nbr.p6 infile

...or make the nbr.p6 script executable and just run nbr.p6 infile.


*Purists will say after the language renaming, that now the correct extension for Raku scripts is .raku, not .p6.

https://docs.raku.org/language/operators#index-entry-ternary
https://raku.org

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.