7

I have a bunch of files that contain XML tags like:

<h> PIDAT <h> O

I need to delete everything what comes after the first <h> in that line, so I can get this:

<h>

For that I'm using

sed -i -e 's/(^<.*?>).+/$1/' *.conll

But it seems that sed is not recognizing the $1. (As I understand, $1 should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.

PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.

1 Answer 1

14

sed backreferences have the form \1, \2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.

Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).

This might work:

sed -i -Ee 's/^(<[^>]*>).*/\1/' *.conll

[^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.

5
  • 1
    Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered. Commented Jul 5, 2018 at 6:01
  • 1
    @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now. Commented Jul 5, 2018 at 6:02
  • Impressive. Thank you for your explanation! I got an error with your line of code: "\1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!! Commented Jul 5, 2018 at 6:03
  • 1
    @CarolinaCárdenas did you get that error when using the -E option? Commented Jul 5, 2018 at 6:04
  • @muru yes. Exactly. Commented Jul 5, 2018 at 6:35

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.