0

I have the following example lines in a file:

sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0

I want to process the file and have the following output:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.

I have not succeeded with the follwing:

sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt

Any bash or awk advice will be welcome.Thanks

1
  • 2
    so is your criteria to make the change on lines 2 and 3 or the all but the first and last lines or all lines that start with g or lines that have the same string either side of the _ or lines that only have alphabetic chars in the part after the _ or something else? All answers so far are making different guesses so please edit your question to state your requirements, don't assume we can guess what you want to do from reading input/output and code that doesn't do whatever it is you want. Commented Mar 19, 2022 at 16:26

5 Answers 5

5

If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z]+ and use \1 in the replacement.

sed -E 's/([a-zA-Z])_[a-zA-Z]+/\1/g' file.txt >out.txt

If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.

If you only want to do this for the start of the string you can prepend ^ to the pattern and omit the /g at the end of the sed command.

sed -E 's/([a-zA-Z]+)(_\1)+/\1/g' file.txt >out.txt

The pattern matches:

  • ([a-zA-Z]+) Capture group 1, match 1 or more occurrences of a char a-zA-Z
  • (_\1)+ Capture group 2, repeat matching _ and the same text captured by group 1

The file out.txt will contain:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
Sign up to request clarification or add additional context in comments.

Comments

3

With your shown samples, please try following awk code.

awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file

Explanation: Simple explanation would be, using awk's split function that splits 1st field into an array named arr with delimiter _ AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1) and by mentioning 1 printing edited/non-edited lines.

Comments

2

You can do it more simply, like this:

sed -E 's/_[a-zA-Z]+//' file.txt >out.txt

This just replaces an underscore followed by any number of alphabetical characters with nothing.

Comments

2
$ awk 'NR~/^[23]$/{sub(/_[^ ]+/,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Comments

2

I would do:

awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file

Prints:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.