Most of the commandline tools I'm looking at have the ability to pick a field delimiter. However, I'd like to pick one delimiter to start, and a different one to end the segment of text I'd like to remove from each line I'm processing.
1text [blah blah blah] text number punctuation text text
2text text text
3text text (text) [blah blah blah] number text
4text <url> <email> text [blah blah blah] text
I'd like to remove all the 'blah blah blah' from those lines.
Blah can contain anything, except newlines, EOFs, and other breaky-things, and '['. ie: I shouldn't have '[[' (nor '[blah[') in any of the data
I only have one (optional) instance of [] per line. So, for line 2 there is nothing to remove, and this shouldn't cause a halt, stop or failure.
I'm almost 100% positive that if I've got a start '[' I also have a ']'. That might be nice to check for, however.
There are other forms of punctuation, so I don't want to work it with something that just looks for non-alphanumeric stuff to start removing (ie: line 4)
Bonus points for being able to figure out if I'm putting together two (now adjacent) whitespaces at that particular point - but without removing double whitespaces at any other point.
I'm pretty sure I'll have to use awk or sed, but if there were a way to do this via regular commandline tools, to make it as portable as possible, that would be ideal.
Also, explaining what you're doing (if you're using regex / sed) would certainly help, as:
A suggestion here says:
sed 's/^.*%\([^ ]*\) .*\$\([^$]*\)$/\1 \2/' infile
I got that kinda working with this bit of monkeying:
cat data | sed 's/^.*\[\([^ ]*\) .*\]\([^$]*\)$/\1 \2/'
However it doesn't take out the whole swath of 'blah blah blah', and leaves with an extra line-break.
Using cut/awk/sed with two different delimiters
Doesn't really answer the question in a general sense (or, at least I wasn't able to figure something out after reading it - maybe just a fail on my part), but seems to be (too) specifically tailored to that person's data.