2

How can one remove code blocks starting with /** START */ and ending with /** END */ using sed non-greedily considering that code blocks are multiline and empty lines may occur within START-END blocks?

START marker single line comment

SOLUTION

INPUT:

class MyClass {
    keepField;
    /** START */
    deleteField;
    /** END */

    construct() {
        /** START */
        this.deleteField = 'delete';
        /** END */
        this.keepField = 'keep';
        /** START */
        this.deleteFunc();
        /** END */
    }
    
    /** START */
    deleteFunc() {
        this.keepField = 'delete';

        if (true) {
            console.debug('Line before if statement is empty.');
        }
    } /** END */
}

OUTPUT:

class MyClass {
    keepField;

    construct() {
        this.keepField = 'keep';
    }
    
}

I have tried the following sed '/./{H;$!d} ; x ; s/START.*END//' MyClass.js as described in the sed manual > Multiline techniques section.

However, the above command is greedy in blocks when there are no empty lines and multiple START-END blocks (like in the constructor) and also empty lines of code are not taken into account inside START-END blocks (like in the deleteFunc function).

Any idea how the above can be solved with sed or any other command line tool such as awk?

START marker is block comment

SOLUTION

INPUT:

class MyClass {
    /**
     * same code as above only this time the START block is 
     * multiline like below.
     */

    /**
     * START
     */
    deleteFunc() {
        this.keepField = 'delete';

        if (true) {
            console.debug('Line before if statement is empty.');
        }
    } /** END */
}

OUTPUT should also be:

class MyClass {
    keepField;

    construct() {
        this.keepField = 'keep';
    }

}
10
  • 1
    sed '/START/,/END/d' file Commented Sep 22, 2022 at 21:55
  • That does the work, thanks! There is only one edge case. I've edited the post and added it. Commented Sep 22, 2022 at 22:20
  • sed '/\* START/,/END/d' Commented Sep 22, 2022 at 23:05
  • 2
    sed and awk are very general text manipulation tools, so parsing code isn't recommended because of its complexity: you can have comments in the middle or end of a line instead of at the start, you can have those keywords in a string, many edge cases. Or even have START by accident in the middle of another comment like on your second example. Commented Sep 23, 2022 at 11:15
  • @HatLess unfortunately sed '/\* START/,/END/d' doesn't seem to work for the edge-case. Commented Sep 23, 2022 at 18:21

2 Answers 2

2

sed is an excellent tool for doing simple s/old/new/ operations. For anything else, just use awk for clarity, efficiency, robustness, portability, maintainability, etc. For example, using any POSIX awk:

$ cat tst.awk
{ rec = rec $0 ORS }
END {
    while ( match(rec,/\/\*\*[[:space:]*]*END[[:space:]*]*\*\//) ) {
        toEnd = substr(rec,1,RSTART+RLENGTH-1)
        sub(/(\n[[:blank:]]*)?\/\*\*[[:space:]*]*START[[:space:]*]*\*\/.*/,"",toEnd)
        printf "%s", toEnd
        rec = substr(rec,RSTART+RLENGTH)
    }
    printf "%s", rec
}

$ awk -f tst.awk file
class MyClass {
    keepField;

    construct() {
        this.keepField = 'keep';
    }

}

class MyClass {
    /**
     * same code as above only this time the START block is
     * multiline like below.
     */

}

If you don't have a POSIX awk then change every [:space:] to \t\n and [:blank:] to \t (first char of each string is a literal blank char) and then it'll work in any awk.

The above was run on this input file:

$ cat file
class MyClass {
    keepField;
    /** START */
    deleteField;
    /** END */

    construct() {
        /** START */
        this.deleteField = 'delete';
        /** END */
        this.keepField = 'keep';
        /** START */
        this.deleteFunc();
        /** END */
    }

    /** START */
    deleteFunc() {
        this.keepField = 'delete';

        if (true) {
            console.debug('Line before if statement is empty.');
        }
    } /** END */
}

class MyClass {
    /**
     * same code as above only this time the START block is
     * multiline like below.
     */

    /**
     * START
     */
    deleteFunc() {
        this.keepField = 'delete';

        if (true) {
            console.debug('Line before if statement is empty.');
        }
    } /** END */
}

but consider also this pathological case where the whole input is on a single line:

$ cat file
class MyClass { keepField; /** START */ deleteField; /** END */ construct() { /** START */ this.deleteField = 'delete'; /** END */ this.keepField = 'keep'; /** START */ this.deleteFunc(); /** END */ } /** START */ deleteFunc() { this.keepField = 'delete'; if (true) { console.debug('Line before if statement is empty.'); } } /** END */ }

and note that the above script handles it correctly (as it also would many other unstated cases I could imagine except where your start/end strings could be inside literal strings or themselves inside comments - you can't handle such cases with pattern matching as we're doing):

$ awk -f tst.awk file
class MyClass { keepField;  construct() {  this.keepField = 'keep';  }  }
2
  • This works well! We now realised though that there are also other edge cases when the START marker is a block comment. So we might go with the single line markers. In which case @HatLess's solution in the first comment would work well. Commented Sep 26, 2022 at 7:55
  • I don't know what that means, sorry. Post a new question if you have different input than shown in this question that you need to handle. Commented Sep 26, 2022 at 12:49
1

Using GNU sed

$ sed -Ez 's~ +/?\*+ START( \*)?([^*]*\*+)([^\n]*\n[^*]*\*+)? END[^\n]*\n~~g' input_file
class MyClass {
    keepField;

    construct() {
        this.keepField = 'keep';
    }
    
}
class MyClass {
    /**
     * same code as above only this time the START block is 
     * multiline like below.
     */

    /**
}
3
  • /** should also be deleted. unfortunately our sed build does not support the -z option. Commented Sep 23, 2022 at 18:43
  • @0x1505 Are you on a mac? If so, you can install GNU sed with brew Commented Sep 23, 2022 at 18:45
  • If possible we want to cover more systems with one command. It could be unpleasant for mac devs to require them to have brew installed. Nevertheless, I linked to your first comment on the post, which solves the problem when the START marker is a single line comment. Commented Sep 26, 2022 at 8:11

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.