0

So I have a file like this:

echo 'this line is added for demo purpose'
echo 'do not extract this line and the line above'

#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# See also: body
#

# Author: http://jeroenjanssens.com

usage () {
cat << EOF
header: add, replace, and delete header lines.

usage: header OPTIONS

OPTIONS:
...
}

# i don't want
# these comments

# even if 
# these lines match

And I want to extract all the lines matching regex ^(#.*)|(\s*)$, from the first line in the file that matches, consecutively to the last line that matches.

The desired result of extraction should be


#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# see also: body
#
# Author: http://jeroenjanssens.com

How do I do this?

I think I can extract all consecutively matching lines with regex in multiline mode, but I only want the first part of the match.

Update:

I want regex ^(#.*)|(\s*)$ to match

  • comments with a # at the beginning of the lines
  • empty lines (like the one after # Author)
  • lines only contain empty spaces
4
  • 4
    what happened to the empty line before # Author... line? and also rest of the line # See also: body...., and how See changed to see there? Commented Sep 12, 2021 at 14:47
  • @αғsнιη Thanks for the comment. I want the empty line after # Author to be matched by regex ^(#.*)|(\s*)$ and be extracted. The see also part was an editing mistake and it's now fixed. :) Commented Sep 12, 2021 at 15:21
  • your edit still didn't address what I pointed. also better explain what is your goal instead of saying you want to match ^(#.*)|(\s*)$. Commented Sep 12, 2021 at 16:38
  • @αғsнιη My goal is to extract the usage info at the beginning of the scripts. Commented Sep 12, 2021 at 23:52

2 Answers 2

1

With awk:

$ awk '/^#/{f=1} f && !/^#|^[[:space:]]*$/{exit} f' ip.txt
#!/usr/bin/env bash
# header: add, replace, and delete header lines.
# 
# Example usage:
# $ seq 10 | header -a 'values'
# $ seq 10 | header -a 'VALUES' | header -e 'tr "[:upper:]" "[:lower:]"'
# $ seq 10 | header -a 'values' | header -d
# $ seq 10 | header -a 'multi\nline' | header -n 2 -e "paste -sd_"
#
# See also: body
#

# Author: http://jeroenjanssens.com

This will start extracting lines when the first comment is found and continue to print as long as a line is a comment or a line with zero or more whitespaces.

1

GNU sed. Without trailing spaces:

sed '/^#/,$!d;:1;/^\s*$/N;/\S/!b1;/^#/M!Q' file

/^#/,$!d - Cut off lines before the start of comments.
:1;/^\s*$/N;/\S/!b1 - If there are empty lines or only spaces, add to the buffer(pattern space).
/^#/M!Q' - if a line is encountered that does not start with a comment mark, exit the script (M - Anchors will be valid in a multiline buffer).

With trailing spaces:

sed '/^#/,$!d;/^#\|^\s*$/!Q' file

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.