5

How can I do my operation only on lines that close their brackets?

For example I want to change ".is" lines and add a bracket but not for lines with unclosed brackets.

so these are changed

this_thing.is 24  ->   (this_thing).is 24
that.is 50        ->   (that).is 50
a[23].is == 10    ->     a[23].is == 10
a.is true         ->   (this_thing).is 24
this_thing.is 24  ->   (this_thing).is 24

but these aren't:

this_thing.is (24
((that).is 50
(a[23].is == 10
a.is ( true
(this_thing.is 24

ideally also (not)

a{.is true
this_thing{.is 24

and (not)

a[.is true
this_thing[.is 24 

I have a matcher with /.is/ but how to match unmatched brackets?

3
  • 4
    Depends on how "well" you need to do it. If you can only have a maximum of one level nesting you could write a complex(ish) regex for this, if you can have arbitrary levels of nesting, you'll need a scripting language. Commented Nov 15, 2013 at 4:02
  • max 1 level would be a start and might suffice for most cases. Commented Nov 15, 2013 at 4:35
  • I think there is errors in some of the first set of examples. Commented Nov 15, 2013 at 8:56

3 Answers 3

5

Personally, if my regular expressions were approaching this level of complexity, I would just switch the whole operation to Perl. This one deals with an arbitrary number of open braces/parentheses/curly braces:

$ perl -ne '@open=/[\[({]/g; @close=/[)\]}]/g; 
             if($#close == $#open){s/(.+?)\.is/($1).is/} print' file

Or, more compact:

$ perl -pne 's/(.+?)\.is/($1).is/ if $#{/[\[({]/g} == $#{/[)\]}]/g}' file

Or more complete, this one can deal with cases like [} (but still fails on cases like )():

  $ perl -pne '@osqb=/\[/g; @csqb=/\]/g; 
               @ocb=/\{/g; @ccb=/\}/g; 
               @op=/\(/g; @cp=/\)/g;
               if($#osqb == $#csqb && $#ocb==$#ccb && $#op == $#cp){
                    s/(.+?)\.is/($1).is/
               }' file

When run on your example, this will print

(this_thing).is 24
(that).is 50      
(a[23]).is == 10 
(a).is true      
(this_thing).is 24
this_thing.is (24
((that).is 50
(a[23].is == 10
a.is ( true
(this_thing.is 24
a{.is true
this_thing{.is 24
a[.is true
this_thing[.is 24 

Explanation

  • perl -ne : process the input file line by line (-n) and run the script given by -e.
  • @open=/[\[({]/g; : find all opening glyphs and save the result in an array called @open.
  • @close=/[)\]}]/g; : as above but for closing glyphs.
  • if($#close == $#open) : if the number of opening glyphs is equal to the number of closing glyphs (if, in other words there are hanging parentheses etc)...
  • s/(.+?)\.is/($1).is/ : ...then replace the shortest string that ends in .is with itself enclosed within parentheses.
  • The last print is outside the brackets and will be executed whether there was a substitution or not.
4
  • This is good, but has some bugs. example [} will match, so will )( etc. Commented Nov 15, 2013 at 9:03
  • @richard true, I am assuming those situations are not very likely, but yes, it will fail on them. Commented Nov 15, 2013 at 13:18
  • @richard thanks, I edited my answer with a more robust example. Still fails on )( but fixing that would make the whole thing too long. Commented Nov 15, 2013 at 14:15
  • +1 for "if my regular expressions were approaching this level of complexity, I would just switch the whole operation to Perl" Commented Nov 16, 2013 at 23:07
2

Expanding on terdon's answer, you can use Perl to truly parse nested parenthetical constructs. Here's a regex that should do it:

$balanced_parens_grammar = qr/
  (?(DEFINE)                         # Define a grammar
    (?<BALANCED_PARENS> 
       \(                            # Opening paren
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^)]*                  # Any non-closing paren
           )                         # End alternation
        \)                           # Closing paren
    )
    (?<BALANCED_BRACKETS> 
       \[                            # Opening bracket
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^\]]*                 # Any non-closing bracket
           )                         # End alternation
        \]                           # Closing bracket
    )
    (?<BALANCED_CURLIES> 
       {                             # Opening curly
          (?:                        # Group without capturing
              (?&BALANCED_PARENS)    # Nested balanced parens
             |(?&BALANCED_BRACKETS)  # Nested balanced brackets
             |(?&BALANCED_CURLIES)   # Nested balanced curlies
             |[^}]*                  # Any non-closing curly
           )                         # End alternation
        }                            # Closing curly
    )
  )
  (?<BALANCED_ANY>
     (?:
         (?&BALANCED_PARENS)    
        |(?&BALANCED_BRACKETS)  
        |(?&BALANCED_CURLIES)   
     )
  )
/x;

Use it like so:

if( $line =~ m/
        ^
          [^()\[\]{}]*       # Any non-parenthetical punctuation
          (?&BALANCED_ANY)?  # Any balanced paren-types
          [^()\[\]{}]*
        $
        $balanced_parens_grammar/x){
    # Do your magic here
}

Disclaimer

Code is completely untested. Probably contains errors.

4
  • Wow, +1. I really need to learn how qr works. Commented Nov 17, 2013 at 3:20
  • @terdon Thanks. FWIW, though, qr// has nothing to do with grammar definition: you could have written the whole thing inside the m// statement and it would've worked just as well. qr// pre-compiles a regex that is not expected to change later and since that's typical of a grammar, I used qr// here. I think it makes things more readable, too. Commented Nov 17, 2013 at 8:02
  • Ah, OK, I knew qr compiled a regex but have never used it. Grammar definition I know nothing about so I assumed the two were connected. Looks like a powerful tool. Commented Nov 17, 2013 at 13:40
  • @terdon Neither do I. I only recently learned about grammar definition in Perl so I can't wait to experiment :). If you think this is powerful try looking up Perl 6 rules! Commented Nov 17, 2013 at 18:47
1

Sed uses regex, regular expressions are not powerful enough for this. Use gawk or some other tool that can do this.

There is a classification of grammars that show this: regular, context-free, etc. Matching parentheses cannot be done with a regular language. So you cannot do it reliably.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.