0
  • Input from record $0:
    -0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ
    
  • Output into /1 with gensub please:
    (T)-8.5(o)-3.2(p)-15.3(ik)
    
2
  • Please don't use those online testers for awk as the syntax and features vary a lot (See Why does my regular expression work in X but not in Y?) ... can you add what is your exact output required? also, does /\[([^]]+)]TJ/ solve your issue? Commented Sep 21, 2020 at 15:18
  • Sorry for misinterpretation. Hopefully it is concise now. Commented Sep 21, 2020 at 15:29

2 Answers 2

2
$ s='-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ'

$ # if you want to delete []TJ
$ echo "$s" | awk '{print gensub(/\[([^]]+)]TJ/, "\\1", "g")}'
-0.005 Tc 0.005 Tw (T)-8.5(o)-3.2(p)-15.3(ik)

$ # if you just want the portion inside []TJ
$ echo "$s" | awk 'match($0, /\[([^]]+)]TJ/, a){s = a[1]; print s}'
(T)-8.5(o)-3.2(p)-15.3(ik)

GNU awk supports third argument for match method, which makes it easy to extract capture groups. The first element of array will have the entire match. Second element will contain portion matched by first group, third element will contain portion matched by second group and so on.

5
  • Thank you! It works with a[1]. Just for information. I tried a[0] and it showed with the left bracket [ and the right bracket included ]TJ. Why is that? From intuition the achieved match should be stored in a[0]? Commented Sep 21, 2020 at 15:43
  • 2
    @andtoe the most common behavior I've seen across different regex implementations is that 0 has entire match, 1 has first capture portion, 2 has second capture portion and so on Commented Sep 21, 2020 at 15:47
  • Last Question. Can awk be given an option or something else to specify a "mode" of a regular expression standard to be used? Or the other way around: What regular expression standard is used by awk by default? Commented Sep 21, 2020 at 15:47
  • From GNU awk manual: "The regular expressions in awk are a superset of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional egrep utility." Commented Sep 21, 2020 at 15:49
  • @andtoe If you found the answer useful, please consider accepting it so that others facing a similar issue may find it more easily. Commented Sep 21, 2020 at 16:12
2
$ echo '-0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)]TJ' |
    awk '{print gensub(/.*\[([^]]+)]TJ/,"\\1",1)}'
(T)-8.5(o)-3.2(p)-15.3(ik)

Web sites like regex101 are practically useless for figuring out regexps to use in command line tools as they don't adequately account for regexp versions (BRE, ERE, or PCRE) and/or delimiters any given tool uses and/or whether the tool supports backreferences in the regexp and/or matching text and/or whether the given version of the given tool has any private extensions, and/or any options the tool might have to affect it's behavior wrt regexps, etc.

3
  • I don't want: -0.005 Tc 0.005 Tw [(T)-8.5(o)-3.2(p)-15.3(ik)] I only want: [(T)-8.5(o)-3.2(p)-15.3(ik)] Commented Sep 21, 2020 at 15:09
  • That's not what you show in your question under Operated string of str. If that's not your expected output then edit your question to clearly show the output you expect given the input you provided. Commented Sep 21, 2020 at 15:11
  • "Operated string of str" shows the actual output of the operation, but that is not what I want. If you would read my question thoroughly, you would understand what I am asking for. No offence. Please read my question thoroughly. Commented Sep 21, 2020 at 15:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.