remove a portion of a line contained within two different types of separator / delimiters [duplicate]

Question

Most of the commandline tools I'm looking at have the ability to pick a field delimiter. However, I'd like to pick one delimiter to start, and a different one to end the segment of text I'd like to remove from each line I'm processing.

1text [blah blah blah] text number punctuation text text
2text text text
3text text (text) [blah blah blah] number text
4text <url> <email> text [blah blah blah] text

I'd like to remove all the 'blah blah blah' from those lines.

Blah can contain anything, except newlines, EOFs, and other breaky-things, and '['. ie: I shouldn't have '[[' (nor '[blah[') in any of the data

I only have one (optional) instance of [] per line. So, for line 2 there is nothing to remove, and this shouldn't cause a halt, stop or failure.

I'm almost 100% positive that if I've got a start '[' I also have a ']'. That might be nice to check for, however.

There are other forms of punctuation, so I don't want to work it with something that just looks for non-alphanumeric stuff to start removing (ie: line 4)

Bonus points for being able to figure out if I'm putting together two (now adjacent) whitespaces at that particular point - but without removing double whitespaces at any other point.

I'm pretty sure I'll have to use awk or sed, but if there were a way to do this via regular commandline tools, to make it as portable as possible, that would be ideal.

Also, explaining what you're doing (if you're using regex / sed) would certainly help, as:

A suggestion here says:

sed 's/^.*%\([^ ]*\) .*\$\([^$]*\)$/\1 \2/' infile

I got that kinda working with this bit of monkeying:

cat data | sed 's/^.*\[\([^ ]*\) .*\]\([^$]*\)$/\1 \2/'

However it doesn't take out the whole swath of 'blah blah blah', and leaves with an extra line-break.

Using cut/awk/sed with two different delimiters

Doesn't really answer the question in a general sense (or, at least I wasn't able to figure something out after reading it - maybe just a fail on my part), but seems to be (too) specifically tailored to that person's data.

Don, this looks like exactly what I was looking for - but it wasn't suggested to me / couldn't find it. I ended up using Terdon's first answer: sed 's/[.*]//g' — user3082
– user3082, Commented Jan 14, 2015 at 18:04

terdon · Accepted Answer · 2015-01-14 17:58:52Z

2

This is very simple. You don't need delimiters as such, a simple regular expression will do. Just look for an opening [, followed by as many non-] or [ characters as possible until the end of the line. For example:

Perl

If you know there are no [[ or other strange things:
```
perl -pe 's/\[.+?\]//g' file
```
If you can have strange things:
```
perl -pe 's/\[[^\[\]]*\]//g' file
```
sed
```
sed  's/\[[^]]*\]//g' file
```

edited Jan 14, 2015 at 17:58

answered Jan 14, 2015 at 17:18

terdon♦

252k69 gold badges480 silver badges718 bronze badges

Costas, do you mean 'in this case'? I was also hoping to get enough of an explanation that any person could use it for any two delimiters, in a command pipe (cat | sed | grep | cut -- or whatever)

user3082
– user3082

2015-01-14 18:00:38 +00:00
Commented Jan 14, 2015 at 18:00
Terdon, perhaps you should list both answers. Greedy vs. non-greedy matching (and when you might want to use one or the other, non-greedy sounds good in case someone has messed up and put in some extra []).

user3082
– user3082

2015-01-14 18:02:32 +00:00
Commented Jan 14, 2015 at 18:02
@anon3202 you can. For example, to use 8 and 2 as start and end delimiter, you would run sed 's/8[^82]*2//g. As for greedy or non, Costas's suggestion is not non-greedy as such. It is just a better way than my original. It can do everything the original could and more so there's little point in posting both. The perl ones are non-greedy.

terdon
– terdon ♦

2015-01-14 18:03:50 +00:00
Commented Jan 14, 2015 at 18:03

Add a comment |

Stack Exchange Network

remove a portion of a line contained within two different types of separator / delimiters [duplicate]

1 Answer 1

Linked

Hot Network Questions

remove a portion of a line contained within two different types of separator / delimiters [duplicate]

1 Answer 1

Linked

Related

Hot Network Questions