grep partial ip number from the file

Question

I have to sort IP addresses into classes, so I can block entire class in myfirewall. It works fine when I try to do for /24 class, but not so well when do for /16 class. I have a list of IPs in the txt file, from which I'd like to sort

for IPBL in `cat /tmp/IPs`; do 
  CT=`grep -c ${IPBL%.[0-9]*} /tmp/IPs`
    if [ "$CT" -gt "10" ]; then 
      echo "$IPBL  ${IPBL%.[0-9]*}.0/24  $CT" >>/tmp/spam.lst
    fi
done
cat /tmp/spam.lst |sort -n

So this works fine and prints me out all ip that have more than 10 matches is C class.

for IPBL in `cat /tmp/IPs`; do 
  CT=`grep -c ${IPBL%.[0-9]*.[0-9]*} /tmp/IPs`
    if [ "$CT" -gt "10" ]; then 
      echo "$IPBL  ${IPBL%.[0-9]*.[0-9]*}.0.0/16  $CT" >>/tmp/spam.lst
    fi
done
cat /tmp/spam.lst |sort -n

So in this example in gets most of the matches OK, but some would grep more, for example.

ip 8.6.X.X would match 18.6.X.X, 168.6.X.X etc.

I tried to place ^${IPBL%.[0-9]*.[0-9]*} in the grep to match beginning of the line, but didn't help, also not

grep -E -c "${IPBL%.[0-9]{1,3}.[0-9]{1,3}}" /tmp/IPs

to have specific number of digits match within ip numbers, and neither placing ^ in front of the string.

What is the most efficient way to pull out exact matches?
File /tmp/IPs is big, but doing B mask grep matches all of the following IP numbers. Correct would be with only 1 match (line 2).

#IPBL=8.6.144.6
#grep ${IPBL%.[0-9]*.[0-9]*} /tmp/IPs
5.188.62.76
8.6.144.6
39.48.63.128
49.178.61.44
68.61.98.98
73.121.228.65
78.128.60.44
81.68.68.194
86.185.248.61
103.129.178.69
108.61.115.213
108.61.199.100
138.68.224.206
138.68.235.36
142.4.218.69
148.63.196.97
148.64.121.254
148.66.129.250
148.66.130.114
149.202.8.66
173.228.198.65
174.251.128.60
176.78.65.246
176.9.208.67
178.128.68.121
178.62.67.41
178.63.146.46
212.48.66.224

Unfortunately, the requirements are not yet quite clear to me. You seem to specify the undesired output you currently get, but without knowing the original /tmp/IPs it is difficult to understand. Please specify (an excerpt of) the original /tmp/IPs along with the desired output. — AdminBee
– AdminBee, Commented Aug 25, 2021 at 9:20
@AdminBee true is list of all IP that match, whereas only 1 should match (line #2), but it should be clear that doing grep in this list matches all instead of just one — DenisZ
– DenisZ, Commented Aug 25, 2021 at 9:46
Problem is that your question in my opinion needs to be rewritten to make the sense clearer. I rewrote my answer because I think I completely misunderstood you the first time. Please revise your post to make sure the intended output becomes clear. — AdminBee
– AdminBee, Commented Aug 25, 2021 at 9:48

AdminBee · Accepted Answer · 2021-08-25 10:53:03Z

If I understand you correctly, you want to parse a list of IPs and identify which class B or C network they belong to. If any such network appears more than 10 times, you want to print the IP along with the network it belongs to in the notation

A.B.C.D   A.B.0.0/16  n

or

A.B.C.D   A.B.C.0/24  n

respectively into an output file spam.lst, where n is the actual occurence count of the respective subnet.

I propose the following awk program for the task (let's call it sort.awk):

#!/bin/awk -f

BEGIN{
    FS=OFS="."
}

NF==4{
    if (FNR==NR) {
        NF=cl
        count[$0]++
        next
    }
    for (n in count) {
        if (index($0,n)==1) {
            if (count[n]<=th) next
            printf "%s %s",$0,n
            for (i=cl;i<4;i++) printf ".0"
            printf "/%d %d\n",8*cl,count[n]
        }
    }
}

You would call it as follows:

awk -v cl=2 -v th=1 -f sort.awk ips.txt ips.txt> spam.lst

Note that the input file is processed two times, and hence appears two times as argument to awk!

The program works as follows:

You specify the CIDR network class as awk variable cl as either 2 for a class B network, or 3 for a class C network.
You specify the minimum occurence count from which on you want to block the entire subnet as awk variable th.
The program sets the input and output separators to . to split input lines at the . into fields.
The script only considers lines containing exactly 4 fields (minimum sanity check for IPs)
In the first pass (FNR, the per-file line counter, is equal to NR, the global line counter), we register the subnets encountered. For every line, the field number is cut to the value in cl to truncate it to the class B or C network "base address". Then, a counter for this (newly regenerated) base address in the array count is increased, and processing skips to the next line.
In the second pass, we iterate over all indices of count (i.e. all sub-nets registered in the first pass) to see if the IP on the current line starts with that subnet address. If the associated count is larger than the threshold, we output the current IP address, then the base address, padded to the right with .0 and with the netmask in CIDR notation appended, and finally the occurence count.

The output for cl=2, th=1 and the example IP list you showed would look like

108.61.115.213 108.61.0.0/16 2
108.61.199.100 108.61.0.0/16 2
138.68.224.206 138.68.0.0/16 2
138.68.235.36 138.68.0.0/16 2
148.66.129.250 148.66.0.0/16 2
148.66.130.114 148.66.0.0/16 2

The original proposition was meant to integrate into the existing script and looked as follows:

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} printf "%s %s/%d\n",$0,nw,8*cl}' ips.txt

Here, we would parse the list of IPs to check whether they fall into the same network as a given network base address, specified via awk variable nw.

In the beginning, the reference network base IP is split by fields into an array ref.
For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the IP is printed, followed by the network in CIDR notation.

DenisZ · Accepted Answer · 2021-08-25 10:30:28Z

As per original question I managed to update grep statement to get wanted result, for anyone interested in bash only solution so updated part of the code looks like:

for IPBL in `cat /tmp/IPs`; do 
  CT=`grep -c "^${IPBL%.[0-9]*.[0-9]*}\." /tmp/IPs`
    if [ "$CT" -gt "10" ]; then 
      echo "$IPBL  ${IPBL%.[0-9]*.[0-9]*}.0.0/16  $CT" >>/tmp/spam.lst
    fi
done
cat /tmp/spam.lst |sort -n

changed was argument form the grep
^-to start from the begining of the line, and
\. to add a dot after second number in the ip address thus giving exact match of particular B class IP range:

"^${IPBL%.[0-9]*.[0-9]*}\."

And now ip 8.6.144.6 has only one match from IPs file, and thus not displayed in the output, but one B class match would look like this:

3.8.35.118  3.8.0.0/16  12
3.8.36.119  3.8.0.0/16  12
3.8.36.121  3.8.0.0/16  12
3.8.37.124  3.8.0.0/16  12
3.8.37.125  3.8.0.0/16  12
3.8.37.126  3.8.0.0/16  12
3.8.37.94  3.8.0.0/16  12
3.8.37.96  3.8.0.0/16  12
3.8.37.97  3.8.0.0/16  12
3.8.37.97  3.8.0.0/16  12
3.8.37.98  3.8.0.0/16  12
3.8.37.98  3.8.0.0/16  12

Stack Exchange Network

grep partial ip number from the file

2 Answers 2

You must log in to answer this question.

Hot Network Questions

grep partial ip number from the file

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions