3

Input example:

START{
    some text

    {
      more text}
almost there
}
nothing important{
...

Desired output:

START{
    some text

    {
      more text}
almost there
}

First open bracket could be in different positions:

START{...
START {...
START
{...

Start could also contain special characters such as: *

I want to print out everything including from START and everything between first matching {} (in bash). I was thinking about having a counter that increments when it finds { and decrements when it finds }. When the result is zero it stops printing out (curly brackets are always matching).

3
  • Can the real text indicated by your placeholders some text, more text, almost there, or nothing important include any of START, {, or }? For example if this were a programming lanaguage you were trying to parse then there might be strings (e.g. "{" or "where is START?") or comments (# { or // { or // not really START) including any of those but which you would not want to consider when counting. Commented Jan 8, 2021 at 16:16
  • No, there is only one START, curly brackets can be nested, but they're always in pairs. Of course there could be more text or more brackets. Commented Jan 8, 2021 at 16:21
  • 1
    If START can really be STA*RT or similar with regexp metachars or typical delimiters then show that as your sample input/output rather than just the sunny-day alphabetic chars case. Commented Jan 8, 2021 at 18:28

2 Answers 2

3

A simple brute force approach that'll work in any awk in any shell on all Unix boxes:

$ cat tst.awk
s=index($0,"START") { $0=substr($0,s); f=1 }
f { rec = rec $0 RS }
END {
    len = length(rec)
    for (i=1; i<=len; i++) {
        char = substr(rec,i,1)
        if ( char == "{" ) {
            ++cnt
        }
        else if ( char == "}" ) {
            if ( --cnt == 0 ) {
                print substr(rec,1,i)
                exit
            }
        }
    }
}

$ awk -f tst.awk file
START{
    some text

    {
      more text}
almost there
}
2

With pcregrep:

start_word='START'
pcregrep -Mo "(?s)\Q$start_word\E\h*(\{(?:[^{}]++|(?1))*+\})" < your-file

With zsh builtins:

set -o rematchpcre
start_word='START'
[[ $(<your-file) =~ "(?s)\Q$start_word\E\h*(\{(?:[^{}]++|(?1))*+\})" ]] &&
  print -r -- $MATCH

Those use PCRE's recursive regexp feature, where (?1) above recalls the regexp in the first (...) pair.

If you have neither pcregrep nor zsh, you can always resort to the real thing (perl, the P in PCRE):

perl -l -0777 -sne '
    print $& if /\Q$start_word\E\h*(\{(?:[^{}]++|(?1))*+\})/s
  ' -- -start_word='START' < your-file

(note that all but the perl one assume the $start_word doesn't contain \E).

7
  • Your solution does work, it's my fault that I forgot to add some details, such that first { could begin in a new line. Commented Jan 8, 2021 at 17:56
  • @GeoCap try changing START to START\s* in the regexp to allow for any optional white space between START and {. Commented Jan 8, 2021 at 17:58
  • @EdMorton, I've now updated it, but used \h instead, for horizontal spacing only so excluding \r/\n... To avoid matching on START<newline>{...}. @GeoCap, replace with \s if that's actually what you want. Commented Jan 8, 2021 at 18:14
  • @StéphaneChazelas I suggested \s because GeoCap specifically said in their comment that the { could be on the line after START: "...first { could begin in a new line.". Commented Jan 8, 2021 at 18:17
  • Thanks, my last issue is that START can contain characters that need to be escaped, like START. I changed the code so I pass in a string /$VAR\s... is there a way to take the string in literal without needing to write STA*RT Commented Jan 8, 2021 at 18:17

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.