sed - how to (not) match unmatched brackets

Question

How can I do my operation only on lines that close their brackets?

For example I want to change ".is" lines and add a bracket but not for lines with unclosed brackets.

so these are changed

this_thing.is 24  ->   (this_thing).is 24
that.is 50        ->   (that).is 50
a[23].is == 10    ->     a[23].is == 10
a.is true         ->   (this_thing).is 24
this_thing.is 24  ->   (this_thing).is 24

but these aren't:

this_thing.is (24
((that).is 50
(a[23].is == 10
a.is ( true
(this_thing.is 24

ideally also (not)

a{.is true
this_thing{.is 24

and (not)

a[.is true
this_thing[.is 24

I have a matcher with /.is/ but how to match unmatched brackets?

Depends on how "well" you need to do it. If you can only have a maximum of one level nesting you could write a complex(ish) regex for this, if you can have arbitrary levels of nesting, you'll need a scripting language. — terdon
– terdon ♦, Commented Nov 15, 2013 at 4:02
max 1 level would be a start and might suffice for most cases. — Michael Durrant
– Michael Durrant, Commented Nov 15, 2013 at 4:35
I think there is errors in some of the first set of examples. — ctrl-alt-delor
– ctrl-alt-delor, Commented Nov 15, 2013 at 8:56

Community · Accepted Answer · 2020-06-11 12:04:56Z

Personally, if my regular expressions were approaching this level of complexity, I would just switch the whole operation to Perl. This one deals with an arbitrary number of open braces/parentheses/curly braces:

$ perl -ne '@open=/[\[({]/g; @close=/[)\]}]/g; 
             if($#close == $#open){s/(.+?)\.is/($1).is/} print' file

Or, more compact:

$ perl -pne 's/(.+?)\.is/($1).is/ if $#{/[\[({]/g} == $#{/[)\]}]/g}' file

Or more complete, this one can deal with cases like [} (but still fails on cases like )():

  $ perl -pne '@osqb=/\[/g; @csqb=/\]/g; 
               @ocb=/\{/g; @ccb=/\}/g; 
               @op=/\(/g; @cp=/\)/g;
               if($#osqb == $#csqb && $#ocb==$#ccb && $#op == $#cp){
                    s/(.+?)\.is/($1).is/
               }' file

When run on your example, this will print

(this_thing).is 24
(that).is 50      
(a[23]).is == 10 
(a).is true      
(this_thing).is 24
this_thing.is (24
((that).is 50
(a[23].is == 10
a.is ( true
(this_thing.is 24
a{.is true
this_thing{.is 24
a[.is true
this_thing[.is 24

Explanation

perl -ne : process the input file line by line (-n) and run the script given by -e.
@open=/[\[({]/g; : find all opening glyphs and save the result in an array called @open.
@close=/[)\]}]/g; : as above but for closing glyphs.
if($#close == $#open) : if the number of opening glyphs is equal to the number of closing glyphs (if, in other words there are hanging parentheses etc)...
s/(.+?)\.is/($1).is/ : ...then replace the shortest string that ends in .is with itself enclosed within parentheses.
The last print is outside the brackets and will be executed whether there was a substitution or not.

This is good, but has some bugs. example [} will match, so will )( etc. — ctrl-alt-delor
– ctrl-alt-delor, Commented Nov 15, 2013 at 9:03
@richard true, I am assuming those situations are not very likely, but yes, it will fail on them. — terdon
– terdon ♦, Commented Nov 15, 2013 at 13:18
@richard thanks, I edited my answer with a more robust example. Still fails on )( but fixing that would make the whole thing too long. — terdon
– terdon ♦, Commented Nov 15, 2013 at 14:15
+1 for "if my regular expressions were approaching this level of complexity, I would just switch the whole operation to Perl" — Joseph R.
– Joseph R., Commented Nov 16, 2013 at 23:07

Joseph R. · Accepted Answer · 2013-11-16 23:06:12Z

Expanding on terdon's answer, you can use Perl to truly parse nested parenthetical constructs. Here's a regex that should do it:

$balanced_parens_grammar = qr/
  (?(DEFINE)                         # Define a grammar
    (?<BALANCED_PARENS> 
       \(                            # Opening paren
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^)]*                  # Any non-closing paren
           )                         # End alternation
        \)                           # Closing paren
    )
    (?<BALANCED_BRACKETS> 
       \[                            # Opening bracket
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^\]]*                 # Any non-closing bracket
           )                         # End alternation
        \]                           # Closing bracket
    )
    (?<BALANCED_CURLIES> 
       {                             # Opening curly
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^}]*                  # Any non-closing curly
           )                         # End alternation
        }                            # Closing curly
    )
  )
  (?<BALANCED_ANY>
     (?:
         (?&BALANCED_PARENS)    
        |(?&BALANCED_BRACKETS)  
        |(?&BALANCED_CURLIES)   
     )
  )
/x;

Use it like so:

if( $line =~ m/
        ^
          [^()\[\]{}]*       # Any non-parenthetical punctuation
          (?&BALANCED_ANY)?  # Any balanced paren-types
          [^()\[\]{}]*
        $
        $balanced_parens_grammar/x){
    # Do your magic here
}

Disclaimer

Code is completely untested. Probably contains errors.

@terdon Thanks. FWIW, though, qr// has nothing to do with grammar definition: you could have written the whole thing inside the m// statement and it would've worked just as well. qr// pre-compiles a regex that is not expected to change later and since that's typical of a grammar, I used qr// here. I think it makes things more readable, too. — Joseph R.
– Joseph R., Commented Nov 17, 2013 at 8:02
Ah, OK, I knew qr compiled a regex but have never used it. Grammar definition I know nothing about so I assumed the two were connected. Looks like a powerful tool. — terdon
– terdon ♦, Commented Nov 17, 2013 at 13:40
@terdon Neither do I. I only recently learned about grammar definition in Perl so I can't wait to experiment :). If you think this is powerful try looking up Perl 6 rules! — Joseph R.
– Joseph R., Commented Nov 17, 2013 at 18:47

Gilles 'SO- stop being evil' · Accepted Answer · 2013-11-16 22:31:01Z

1

Sed uses regex, regular expressions are not powerful enough for this. Use gawk or some other tool that can do this.

There is a classification of grammars that show this: regular, context-free, etc. Matching parentheses cannot be done with a regular language. So you cannot do it reliably.

edited Nov 16, 2013 at 22:31

Gilles 'SO- stop being evil'

865k205 gold badges1.8k silver badges2.3k bronze badges

answered Nov 15, 2013 at 9:00

ctrl-alt-delor

28.8k11 gold badges66 silver badges113 bronze badges

Add a comment |

Stack Exchange Network

sed - how to (not) match unmatched brackets

3 Answers 3

Explanation

You must log in to answer this question.

Linked

Hot Network Questions

sed - how to (not) match unmatched brackets

3 Answers 3

Explanation

You must log in to answer this question.

Linked

Related

Hot Network Questions