How to remove underscore character with awk

Question

I have a file as below:

This is an _PLUTO_
This is _PINEAPPLE_
This is _ORANGE_
This is _RICE_

I'm using below code to extract the output:

awk '{ print "Country: "  $NF }'  report.txt

Output:

Country: _PLUTO_
Country: _PINEAPPLE_
Country: _ORANGE_
Country: _RICE_

How do I remove all the underscore so that my output looks below:

Country: PLUTO
Country: PINEAPPLE
Country: ORANGE
Country: RICE

substr or gsub - see gnu.org/software/gawk/manual/gawk.html#String-Functions — steeldriver
– steeldriver, Commented Jan 3, 2019 at 4:10
With that specific input awk -F_ '{print "Country: " $2}' would also work. — Stéphane Chazelas
– Stéphane Chazelas, Commented Jan 3, 2019 at 21:05

filbranden · Accepted Answer · 2019-01-03 04:25:43Z

8

You can use this snippet:

$ awk '{ gsub("_", "", $NF); print "Country: " $NF }' report.txt
Country: PLUTO
Country: PINEAPPLE
Country: ORANGE
Country: RICE

Note that gsub() will perform the modification in place, so it will store the result of the substitution back to $NF, in your case.

If you're using GNU awk, you can use gensub() instead, which is slightly simpler:

$ gawk '{ print "Country: " gensub("_", "", "g", $NF) }' report.txt
Country: PLUTO
Country: PINEAPPLE
Country: ORANGE
Country: RICE

See GNU awk documentation for gsub() and gensub() for more details.

answered Jan 3, 2019 at 4:25

filbranden

22.6k4 gold badges65 silver badges87 bronze badges

1

awk '{gsub("_", "", $0); print}' report.txt works as well. When print is called with no arguments, it prints the whole record, AKA $0. Also, if you are using Solaris by any chance, you need to use nawk for gsub to be available. On Red Hat Linux 6.x, nawk is a link to gawk, which also supports gsub.

Larry
– Larry

2019-01-03 21:35:21 +00:00
Commented Jan 3, 2019 at 21:35

Add a comment |

αғsнιη · Accepted Answer · 2019-01-03 05:01:15Z

1

try

awk -F_ '{ print "Country: " $(NF-1) }' infile

You could try sed instead.

sed -r 's/[^_]*_([^_]*)_.*/Country: \1/' infile

[^_]*_ matches everything until a first _ seen.
([^_]*)_ matches everything after above match untill next _ seen and .* matches everything after that, but only keep (...) part as a captured group.
\1 is the back-reference to the ([^_]*) captured group.

edited Jan 3, 2019 at 5:01

answered Jan 3, 2019 at 4:53

αғsнιη

41.9k17 gold badges75 silver badges117 bronze badges

Add a comment |

Kusalananda · Accepted Answer · 2019-01-03 16:25:20Z

1

Using sed instead:

$ sed -E 's/^This is (an? )?/Country: /; s/\<_//; s/_\>//' file
Country: PLUTO
Country: PINEAPPLE
Country: ORANGE
Country: RICE

This applies three substitutions:

Replaces the text This is optionally followed by either a or an with Country:.
Removes _ at the start of a word.
Removes _ at the end of a word.

The last two substitutions allows for data on the form

This is a _big_blue_ball_

which would be transformed into

Country: big_blue_ball

and not

Country: big blue ball

An awk alternative that just ignores the first part of each line and trims the first and last characters off of the last whitespace-delimited field:

awk '{ printf("Country: %s\n", substr($NF, 2, length($NF)-2)) }'

answered Jan 3, 2019 at 16:25

Kusalananda♦

356k42 gold badges735 silver badges1.1k bronze badges

With sed, you can also simply use this: sed 's/_//g' report.txt to delete all underscores. If you want to change the file itself, you can do an in-line replace: sed -i 's/_//g' report.txt

Larry
– Larry

2019-01-03 20:58:34 +00:00
Commented Jan 3, 2019 at 20:58
@Larry Sure, but the point I was making is that one may only want to delete the flanking underscores, and the rest of that field could contain underscores that should be preserved.

Kusalananda
– Kusalananda ♦

2019-01-03 21:07:04 +00:00
Commented Jan 3, 2019 at 21:07
Indeed, that can be quite useful. If you want to restrict the regular expression so as not to touch possibly other lines, then it is also a good idea that you first filtered for lines of the required format (contains "This is a/an", etc.). Kudos.

Larry
– Larry

2019-01-03 21:14:57 +00:00
Commented Jan 3, 2019 at 21:14

Add a comment |

Praveen Kumar BS · Accepted Answer · 2019-01-03 18:23:11Z

0

Done by using python

#!/usr/bin/python
import re
l=[]
k=open('file.txt','r')
for i in k:
        l.append(i)
m=re.compile(r'_.*')
for h in l:
        out=re.search(m,h)
        print "Country:",out.group().split('_')[-2]

output

Country: PLUTO
Country: PINEAPPLE
Country: ORANGE
Country: RICE

answered Jan 3, 2019 at 18:23

Praveen Kumar BS

5,3112 gold badges12 silver badges16 bronze badges

Add a comment |

Stack Exchange Network

How to remove underscore character with awk

4 Answers 4

You must log in to answer this question.

Hot Network Questions

How to remove underscore character with awk

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions