Extract text including parens

Question

I have some text like this:

Sentence #1 (n tokens):
Blah Blah Blah
[...
 ...
 ...]
( #start first set here
 ... (other possible parens and text here)
 ) #end first set here

(...)
(...)

Sentence #2 (n tokens):

I want to extract the second set of parens (including everything in between) ,i.e.,

(
 ... (other possible parens here)
)

Is there a bash way to do this. I tried the simple

 's/(\(.*\))/\1/'

Regular expressions cannot handle "matching parentheses" -- they are mathematically incapable of it. — glenn jackman
– glenn jackman, Commented Sep 25, 2014 at 21:20
I don't think that is the case, because I have extracted the lines above with "[...]". Plus, I am not looking to match the parens, just aggressive match and skip that blank line after. If this absolutely not possible with sed what alternatives do you suggest? — knk
– knk, Commented Sep 25, 2014 at 21:23
Are the opening and closing parens alone on their own lines like you show here? — glenn jackman
– glenn jackman, Commented Sep 25, 2014 at 21:26
Pretty much, its like "(ROOT" and "(. .)))". This is a sentence parsed using the stanford parser. If I can write one for the simpler case I can modify it for the specific case. — knk
– knk, Commented Sep 25, 2014 at 21:29
@glennjackman There is a complication - things like perl regular expresions etc are not regular expressions in the mathematical sense; They can do much more. In most cases your point is true anyway - it's just not that easy to tell. — Volker Siegel
– Volker Siegel, Commented Sep 26, 2014 at 6:10

glenn jackman · Accepted Answer · 2014-09-25 21:49:21Z

8

This will do it. There's probably a better way, but this is the first approach that came to mind:

echo 'Sentence #1 (n tokens):
Blah Blah Blah
[...
 ...
 ...]
(
 ... (other possible parens here)
 )

(...)
(...)

Sentence #2 (n tokens):
' | perl -0777 -nE '
    $wanted = 2; 
    $level = 0; 
    $text = ""; 
    for $char (split //) {
        $level++ if $char eq "(";
        $text .= $char if $level > 0;
        if ($char eq ")") {
            if (--$level == 0) {
                if (++$n == $wanted) { 
                    say $text;
                    exit;
                }
                $text="";
            }
        }
    }
'

outputs

(
 ... (other possible parens here)
 )

answered Sep 25, 2014 at 21:49

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

think I should actually sit down to learn PERL now, thanks and sorry cannot vote up yet!

knk
– knk

2014-09-25 21:58:51 +00:00
Commented Sep 25, 2014 at 21:58
+1 I once wrote a (completely untested) BNF-like Perl grammar for generic parenthetical constructs that might also be relevant.

Joseph R.
– Joseph R.

2014-09-26 03:30:56 +00:00
Commented Sep 26, 2014 at 3:30

Add a comment |

Digital Trauma · Accepted Answer · 2014-09-26 00:00:09Z

Glenn's answer is good (and probably faster for large input), but for the record, what Glenn proposes is totally possible in bash too. It was a relatively simple matter to port his answer to pure bash in just a few minutes:

s='Sentence #1 (n tokens):
Blah Blah Blah
[...
 ...
 ...]
(
 ... (other possible parens here)
 )

(...)
(...)

Sentence #2 (n tokens):
'
wanted=2
level=0
text=""
for (( i=0; i<${#s}; i++ )); do
    char="${s:i:1}"
    if [ "$char" == "(" ]; then (( level++ )) ; fi
    if (( level > 0 )); then text+="$char"; fi
    if [ "$char" == ")" ]; then
        if (( --level == 0 )); then
            if (( ++n == wanted )); then
                echo "$text"
                exit
            fi
            text=""
        fi
    fi
done

Stack Exchange Network

Extract text including parens

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Extract text including parens

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions