Bash script to replace matched substrings within larger substring

Question

I'm trying to write a bash script to replace the newline characters and *s from comments, but only if that comment contains a particular substring.

// file.txt
/**
 * Here is a multiline
 * comment that contains substring
 * rest of it
 */

/**
 * Here is a multiline
 * comment that does not contain subNOTstring
 * rest of it
 */

I would like the final result to be:

// file.txt
/** Here is a multiline comment that contains substring rest of it */

/**
 * Here is a multiline
 * comment that does not contain subNOTstring
 * rest of it
 */

I have a regex that matches multiline comments: \/\*([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*\*\/ but can't figure out the second part, of only matching with the substring, and then replacing all the /n * with just

So to make sure my question is articulated correctly

Make a match of a substring within a file. i.e. comment
Make sure that match includes substring.
Replace all substring within the first match with another string. i.e. n/ * with

I wouldn't do this multiline matching in bash. Perhaps sed would be an option, or awk or some more flexible programming language (Ruby, Perl). In any case, your approach would fail if the character sequence /** occurs in a context where it does not denote the start of a comment. I don't know about the general syntax of your file, but this would apply for instance to C or PL/1 source code. — user1934428
– user1934428, Commented Aug 16, 2022 at 6:33
@user1934428 sed works in bash so that would be an acceptable solution. I just need to be able to run a *.sh file and have it do the thing. — Gaximus
– Gaximus, Commented Aug 16, 2022 at 18:53

tshiono · Accepted Answer · 2022-08-16 22:03:09Z

If python is your option, would you please try:

#!/usr/bin/python

import re                                                       # use regex module

with open('file.txt') as f:                                     # open "file.txt" to read
    str = f.read()                                              # assign "str" to the lines of the file

for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL):        # split the file on the comment including the comment in the result
    if re.match(r'/\*.*substring', i, flags=re.DOTALL):         # if the comment includes the keyword "substring"
        i = re.sub(r'\n \* |\n (?=\*/)', ' ', i)                # then replace the newline and the asterisk with a whitespace
    print(i, end='')                                            # print the element without adding newline

re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL) splits "str" on the comment including the comment in the splitted list.
The flags=re.DOTALL option makes a dot match with newline characters.
for i in .. syntax loops over the list assiging "i" to each element.
re.match(r'/\*.*substring', i, flags=re.DOTALL) matches the element which is a comment including the keyword "substring".
re.sub(r'\n \* |\n (?=\*/)', ' ', i) replaces a newline followed by the " * " in the next line with a whitespace.
\n (?=\*/) is a positive lookahead which matches a newline followed by " */". It will match the last line of the comment block leaving the "*/" as is.

[Edit]
If you want to embed the python script in bash, would you please try:

#!/bin/bash

infile="file.txt"                       # modify according to your actual filename
tmpfile=$(mktemp /tmp/temp.XXXXXX)      # temporary file to output

# start of python script
python3 -c "
import re, sys

filename = sys.argv[1]
with open(filename) as f:
    str = f.read()

for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL):
    if re.match(r'/\*.*substring', i, flags=re.DOTALL):
        i = re.sub(r'\n \* |\n (?=\*/)', ' ', i)
    print(i, end='')
" "$infile" > "$tmpfile"
# end of python script

mv -f -- "$infile" "$infile".bak        # backup the original file
mv -f -- "$tmpfile" "$infile"           # replace the input file with the output

If python were an option OP would have included it in the tags. Downvoted
I think I can make this work by just calling the python from the bash script. Expect that it is just printing the result instead of saving it over the original file.txt. How would I make it save over the original.
Thank you for the feedback. You're right. The posted python script just prints the output without replacing the original file. I've updated my answer with another version which embeds the python script in bash and overwrites the original file. BR.

Collectives™ on Stack Overflow

Bash script to replace matched substrings within larger substring

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related