How do you remove dot character from string without calling sed or awk again?

Question

I have a file called hostlist.txt that contains text like this:

host1.mydomain.com
host2.mydomain.com
anotherhost
www.mydomain.com
login.mydomain.com
somehost
host3.mydomain.com

I have the following small script:

#!/usr/local/bin/bash

while read host; do
        dig +search @ns1.mydomain.com $host ALL \
        | sed -n '/;; ANSWER SECTION:/{n;p;}';
done <hostlist.txt \
        | gawk '{print $1","$NF}' >fqdn-ip.csv

Which outputs to fqdn-ip.csv:

host1.mydomain.com.,10.0.0.1
host2.mydomain.com.,10.0.0.2
anotherhost.internal.mydomain.com.,10.0.0.11
www.mydomain.com.,10.0.0.10
login.mydomain.com.,10.0.0.12
somehost.internal.mydomain.com.,10.0.0.13
host3.mydomain.com.,10.0.0.3

My question is how do I remove the . just before the comma without invoking sed or gawk again? Is there a step I can perform in the existing sed or gawk calls that will strip the dot?

hostlist.txt will contain 1000s of hosts so I want my script to be fast and efficient.

@RogerLipscombe because some of the hosts in my hostlist.txt are just hostnames, not FQDNs so I'm using +search to resolve them. — Linoob
– Linoob, Commented May 26, 2016 at 17:43

John1024 · Accepted Answer · 2016-05-26 21:03:32Z

The sed command, the awk command, and the removal of the trailing period can all be combined into a single awk command:

while read -r host; do dig +search "$host" ALL; done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'

Or, as spread out over multiple lines:

while read -r host
do
    dig +search "$host" ALL
done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'

Because the awk command follows the done statement, only one awk process is invoked. Although efficiency may not matter here, this is more efficient than creating a new sed or awk process with each loop.

Example

With this test file:

$ cat hostlist.txt 
www.google.com
fd-fp3.wg1.b.yahoo.com

The command produces:

$ while read -r host; do dig +search "$host" ALL; done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'
www.google.com, 216.58.193.196
fd-fp3.wg1.b.yahoo.com, 206.190.36.45

How it works

awk implicitly reads its input one record (line) at a time. This awk script uses a single variable, f, which signals whether the previous line was an answer section header or not.

f{sub(/.$/,"",$1); print $1", "$NF; f=0}

If the previous line was an answer section header, then f will be true and the commands in curly braces are executed. The first removes the trailing period from the first field. The second prints the first field, followed by ,, followed by the last field. The third statement resets f to zero (false).

In other words, f here functions as a logical condition. The commands in curly braces are executed if f is nonzero (which, in awk, means 'true').
/ANSWER SECTION/{f=1}

If the current line contains the string ANSWER SECTION, then the variable f is set to 1 (true).

Here, /ANSWER SECTION/ serves as a logical condition. It evaluates to true if the current matches the regular expression ANSWER SECTION. If it does, then the command in curly braces in executed.

Thank you @John1024! I didn't know that awk didn't need to be within the loop (I thought that it would only act on the last line if it was outside). Is f an arbitrary variable or is f{} an explicit part of awk's functionality? — Linoob
– Linoob, Commented May 26, 2016 at 17:53
Your welcome. f is an arbitrary variable. You can actually put before the {} complex logical conditions. f is just a very simple logical condition: it is true if nonzero, false if zero. — John1024
– John1024, Commented May 26, 2016 at 17:59
@Linoob Note that in the second command, /ANSWER SECTION/ plays the role of logical condition, analogous to the role f played in the first command. I have updated the answer to discuss this. — John1024
– John1024, Commented May 26, 2016 at 21:04

cas · Accepted Answer · 2016-05-26 05:09:26Z

8

dig can read in a file containing a list of hostnames and process them one by one. You can also tell dig to suppress all output except the answer section.

This should give you the output you want:

dig -f hostlist.txt +noall +answer +search | 
    awk '{sub(/\.$/,"",$1); print $1","$5}'

awk's sub() function is used to strip the literal period . from the end of the first field. Then awk prints fields 1 and 5 separated by a comma.

NOTE: entries in hostlist.txt that do not resolve are completely discarded - they do not appear on stdout OR stderr.

(Tested on Linux and FreeBSD)

edited May 26, 2016 at 5:09

answered May 26, 2016 at 5:03

cas

84k8 gold badges136 silver badges205 bronze badges

Add a comment |

DopeGhoti · Accepted Answer · 2016-05-25 22:32:17Z

6

Change your invocation of gawk to the following:

| gawk '{print substr($1,1,length($1)-1)","$NF}' >fqdn-ip.csv

answered May 25, 2016 at 22:32

DopeGhoti

79.2k10 gold badges107 silver badges141 bronze badges

Add a comment |

Stack Exchange Network

How do you remove dot character from string without calling sed or awk again?

3 Answers 3

Example

How it works

You must log in to answer this question.

Hot Network Questions

How do you remove dot character from string without calling sed or awk again?

3 Answers 3

Example

How it works

You must log in to answer this question.

Related

Hot Network Questions