2

I have a regular expression that ignores multiline comments that begin with /* ... */ But doesnt work with lines that begin with //

Can some one suggest what to add in this regex to make it ignore

pattern = r"""
                        ##  --------- COMMENT ---------
       /\*              ##  Start of /* ... */ comment
       [^*]*\*+         ##  Non-* followed by 1-or-more *'s
       (                ##
         [^/*][^*]*\*+  ##
       )*               ##  0-or-more things which don't start with /
                        ##    but do end with '*'
       /                ##  End of /* ... */ comment
     |                  ##  -OR-  various things which aren't comments:
       (                ## 
                        ##  ------ " ... " STRING ------
         "              ##  Start of " ... " string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^"\\]       ##  Non "\ characters
         )*             ##
         "              ##  End of " ... " string
       |                ##  -OR-
                        ##
                        ##  ------ ' ... ' STRING ------
         '              ##  Start of ' ... ' string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^'\\]       ##  Non '\ characters
         )*             ##
         '              ##  End of ' ... ' string
       |                ##  -OR-
                        ##
                        ##  ------ ANYTHING ELSE -------
         .              ##  Anything other char
         [^/"'\\]*      ##  Chars which doesn't start a comment, string
       )                ##    or escape
    """
4
  • What are you using this for? Is a regex really needed? Commented Aug 5, 2015 at 18:27
  • At that point you may stop using regex only (multiline comments are not context free grammars anyway). I did use a custom parser to look for raw strings in C/C++ source files : github.com/lucasg/MSVCUnicodeUpdater/blob/master/sed.py Commented Aug 5, 2015 at 18:28
  • Looks like this is a case where going with a parsing framework such as pyparsing might be a lot more manageable. Commented Aug 5, 2015 at 18:29
  • HI @MichaelSPriz . I am writing a python tool that strips off comments (which contain perforce header information and date and time modified information) between two c/cpp files so that i could compare them and see if there is change in code. Commented Aug 5, 2015 at 18:29

1 Answer 1

1

If you plan using the current regexp, here is what you can do to match //... comments:

Below this:

 /                ##  End of /* ... */ comment

Add this:

 |                  ## OR it is a line comment with //
  \s*//.*           ## Single line comment

See demo

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.