If I understand you correctly, you want to parse a list of IPs and identify which class B or C network they belong to. If any such network appears more than 10 times, you want to print that inthe IP along with the network it belongs to in the notation
XA.YB.C.D A.B.0.0/16 n
XA.YB.ZC.D A.B.C.0/24 n
respectively into an output file spam.lst, where n is the actual occurence count of the respective subnet.
#!/bin/awk -f
BEGIN{
FS=OFS="."
}
NF==4{
if (FNR==NR) {
NF=cl
count[$0]++
}
next
END { }
for (n in count) {
if (count[n]>thindex($0,n)==1) {
if (count[n]<=th) next
printf "%s""%s %s",$0,n
for (i=cl;i<4;i++) {printf ".0"}
printf "/%d %d\n",8*cl,count[n]
}
}
}
awk -v cl=2 -v th=1 -f sort.awk ips.txt >ips.txt> spam.lst
Note that the input file is processed two times, and hence appears two times as argument to awk!
The program works as follows:
- You specify the CIDR network class as
awkvariableclas either2for a class B network, or3for a class C network. - You specify the minimum occurence count from which on you want to block the entire subnet as
awkvariableth. - The program sets the input and output separators to
.to split input lines at the.into fields. - Whenever a line containsThe script only considers lines containing exactly 4 fields (minimum sanity check for IPs)
- In the first pass (
FNR, the per-file line counter, is equal toNR, the global line counter), thatwe register the subnets encountered. For every line, the field number is cut to the value inclto truncate it to the class B or C network "base address". Then, a counter for this (newly regenerated) base address in the arraycountis increased, and processing skips to the next line. - At end-of-fileIn the second pass, we iterate over all indices of the array
count(i.e. all unique base addresses that have occurred so farsub-nets registered in the first pass) to see if the IP on the current line starts with that subnet address. If the associated count is larger than the threshold, we output the current IP address, then the base address, padded to the right with.0and with the netmask in CIDR notation appended, and finally the occurence count.
108.61.115.213 108.61.0.0/16 2
108.61.199.100 108.61.0.0/16 2
138.68.224.206 138.68.0.0/16 2
138.68.235.36 138.68.0.0/16 2
148.66.129.250 148.66.0.0/16 2
148.66.130.114 148.66.0.0/16 2