Consider a text file, with lines of text gathered in many blocks, where each block is separated by at least one empty line. Using a Bash one-liner, how do I delete all text from < to either > or \n\n?
To put it differently: Delete everything between each pair of < and >. If a <has no closing >, delete everything until the end of the block (an empty line), but never, ever delete outside the block!
Conceptually, should I physically separate the blocks into objects in a list before parsing for safety, or is this a straight forward linear text parsing job as long as you know what you are doing?
Example text:
This is the first
block of text.
<-- empty line
<delete me>
This is the second block.
<delete
here>
<delete this, but
<-- empty line
do not delete this>
<delete this too>
Third block here.
(more blocks)
The result should be:
This is the first
block of text.
<-- empty line
This is the second block.
<-- empty line
do not delete this>
Third block here.
...(empty line)? Or do you just mean a line without a>? If the former, perhaps just edit the question to show a literal empty line; we will understand.awk -v RS= -v ORS= '{gsub(/<[^>]+>?/, "")}1'doesn't preserve the empty lines but newline character after>remains.. so may not suit for your real sampleperl -0777pe 's/<.*?(>|(?=\n\n))//sg'