use bash or awk to replace part of a string

Question

I have the following example lines in a file:

sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0

I want to process the file and have the following output:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.

I have not succeeded with the follwing:

sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt

Any bash or awk advice will be welcome.Thanks

so is your criteria to make the change on lines 2 and 3 or the all but the first and last lines or all lines that start with g or lines that have the same string either side of the _ or lines that only have alphabetic chars in the part after the _ or something else? All answers so far are making different guesses so please edit your question to state your requirements, don't assume we can guess what you want to do from reading input/output and code that doesn't do whatever it is you want. — Ed Morton
– Ed Morton, Commented Mar 19, 2022 at 16:26

The fourth bird · Accepted Answer · 2022-03-19 16:58:42Z

If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z]+ and use \1 in the replacement.

sed -E 's/([a-zA-Z])_[a-zA-Z]+/\1/g' file.txt >out.txt

If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.

If you only want to do this for the start of the string you can prepend ^ to the pattern and omit the /g at the end of the sed command.

sed -E 's/([a-zA-Z]+)(_\1)+/\1/g' file.txt >out.txt

The pattern matches:

([a-zA-Z]+) Capture group 1, match 1 or more occurrences of a char a-zA-Z
(_\1)+ Capture group 2, repeat matching _ and the same text captured by group 1

The file out.txt will contain:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

RavinderSingh13 · Accepted Answer · 2022-03-19 16:17:50Z

3

With your shown samples, please try following awk code.

awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file

Explanation: Simple explanation would be, using awk's split function that splits 1st field into an array named arr with delimiter _ AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1) and by mentioning 1 printing edited/non-edited lines.

answered Mar 19, 2022 at 16:17

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Comments

Paul R · Accepted Answer · 2022-03-19 16:17:29Z

2

You can do it more simply, like this:

sed -E 's/_[a-zA-Z]+//' file.txt >out.txt

This just replaces an underscore followed by any number of alphabetical characters with nothing.

answered Mar 19, 2022 at 16:17

Paul R

214k38 gold badges402 silver badges579 bronze badges

Comments

Ed Morton · Accepted Answer · 2022-03-19 16:27:49Z

2

$ awk 'NR~/^[23]$/{sub(/_[^ ]+/,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

answered Mar 19, 2022 at 16:27

Ed Morton

208k18 gold badges90 silver badges212 bronze badges

Comments

dawg · Accepted Answer · 2022-03-19 17:01:35Z

2

I would do:

awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file

Prints:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

answered Mar 19, 2022 at 17:01

dawg

105k24 gold badges142 silver badges217 bronze badges

Collectives™ on Stack Overflow

use bash or awk to replace part of a string

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Related