Extract first block of consecutively matching lines from a file?

Question

So I have a file like this:

echo 'this line is added for demo purpose'
echo 'do not extract this line and the line above'

#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# See also: body
#

# Author: http://jeroenjanssens.com

usage () {
cat << EOF
header: add, replace, and delete header lines.

usage: header OPTIONS

OPTIONS:
...
}

# i don't want
# these comments

# even if 
# these lines match

And I want to extract all the lines matching regex ^(#.*)|(\s*)$, from the first line in the file that matches, consecutively to the last line that matches.

The desired result of extraction should be


#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# see also: body
#
# Author: http://jeroenjanssens.com

How do I do this?

I think I can extract all consecutively matching lines with regex in multiline mode, but I only want the first part of the match.

Update:

I want regex ^(#.*)|(\s*)$ to match

comments with a # at the beginning of the lines
empty lines (like the one after # Author)
lines only contain empty spaces

what happened to the empty line before # Author... line? and also rest of the line # See also: body...., and how See changed to see there? — αғsнιη
– αғsнιη, Commented Sep 12, 2021 at 14:47
@αғsнιη Thanks for the comment. I want the empty line after # Author to be matched by regex ^(#.*)|(\s*)$ and be extracted. The see also part was an editing mistake and it's now fixed. :) — Teddy C
– Teddy C, Commented Sep 12, 2021 at 15:21
your edit still didn't address what I pointed. also better explain what is your goal instead of saying you want to match ^(#.*)|(\s*)$. — αғsнιη
– αғsнιη, Commented Sep 12, 2021 at 16:38
@αғsнιη My goal is to extract the usage info at the beginning of the scripts. — Teddy C
– Teddy C, Commented Sep 12, 2021 at 23:52

Sundeep · Accepted Answer · 2021-09-13 05:35:47Z

With awk:

$ awk '/^#/{f=1} f && !/^#|^[[:space:]]*$/{exit} f' ip.txt
#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# See also: body
#

# Author: http://jeroenjanssens.com

This will start extracting lines when the first comment is found and continue to print as long as a line is a comment or a line with zero or more whitespaces.

nezabudka · Accepted Answer · 2021-09-13 07:05:36Z

1

GNU sed. Without trailing spaces:

sed '/^#/,$!d;:1;/^\s*$/N;/\S/!b1;/^#/M!Q' file

/^#/,$!d - Cut off lines before the start of comments.
:1;/^\s*$/N;/\S/!b1 - If there are empty lines or only spaces, add to the buffer(pattern space).
/^#/M!Q' - if a line is encountered that does not start with a comment mark, exit the script (M - Anchors will be valid in a multiline buffer).

With trailing spaces:

sed '/^#/,$!d;/^#\|^\s*$/!Q' file

edited Sep 13, 2021 at 7:05

answered Sep 13, 2021 at 6:32

nezabudka

2,4567 silver badges15 bronze badges

Add a comment |

Stack Exchange Network

Extract first block of consecutively matching lines from a file?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Extract first block of consecutively matching lines from a file?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions