Revisions to grep partial ip number from the file

Adapt for the actual output the OP wants ...

Source Link

edited Aug 25, 2021 at 10:53

23.6k
25
55
77

If I understand you correctly, you want to parse a list of IPs and identify which class B or C network they belong to. If any such network appears more than 10 times, you want to print that inthe IP along with the network it belongs to in the notation

XA.YB.C.D   A.B.0.0/16  n

XA.YB.ZC.D   A.B.C.0/24  n

respectively into an output file spam.lst, where n is the actual occurence count of the respective subnet.

#!/bin/awk -f

BEGIN{
    FS=OFS="."
}

NF==4{
    if (FNR==NR) {
        NF=cl
        count[$0]++
}
        next
END {   }
    for (n in count) {
        if (count[n]>thindex($0,n)==1) {
            if (count[n]<=th) next
            printf "%s""%s %s",$0,n
            for (i=cl;i<4;i++) {printf ".0"}
            printf "/%d %d\n",8*cl,count[n]
        }
    }
}

awk -v cl=2 -v th=1 -f sort.awk ips.txt >ips.txt> spam.lst

Note that the input file is processed two times, and hence appears two times as argument to awk!

The program works as follows:

You specify the CIDR network class as awk variable cl as either 2 for a class B network, or 3 for a class C network.
You specify the minimum occurence count from which on you want to block the entire subnet as awk variable th.
The program sets the input and output separators to . to split input lines at the . into fields.
Whenever a line containsThe script only considers lines containing exactly 4 fields (minimum sanity check for IPs)

In the first pass (FNR, the per-file line counter, is equal to NR, the global line counter), thatwe register the subnets encountered. For every line, the field number is cut to the value in cl to truncate it to the class B or C network "base address". Then, a counter for this (newly regenerated) base address in the array count is increased, and processing skips to the next line.
At end-of-fileIn the second pass, we iterate over all indices of the array count (i.e. all unique base addresses that have occurred so farsub-nets registered in the first pass) to see if the IP on the current line starts with that subnet address. If the associated count is larger than the threshold, we output the current IP address, then the base address, padded to the right with .0 and with the netmask in CIDR notation appended, and finally the occurence count.

108.61.115.213 108.61.0.0/16 2
108.61.199.100 108.61.0.0/16 2
138.68.224.206 138.68.0.0/16 2
138.68.235.36 138.68.0.0/16 2
148.66.129.250 148.66.0.0/16 2
148.66.130.114 148.66.0.0/16 2

If I understand you correctly, you want to parse a list of IPs and identify which class B or C network they belong to. If any such network appears more than 10 times, you want to print that in the notation

X.Y.0.0/16

X.Y.Z.0/24

respectively into an output file spam.lst.

#!/bin/awk -f

BEGIN{
    FS=OFS="."
}

NF==4{
    NF=cl
    count[$0]++
}

END {
    for (n in count) {
        if (count[n]>th) {
            printf "%s",n
            for (i=cl;i<4;i++) {printf ".0"}
            printf "/%d\n",8*cl
        }
    }
}

awk -v cl=2 -v th=1 -f sort.awk ips.txt > spam.lst

The program works as follows:

You specify the CIDR network class as awk variable cl as either 2 for a class B network, or 3 for a class C network.
You specify the minimum occurence count from which on you want to block the entire subnet as awk variable th.
The program sets the input and output separators to . to split input lines at the . into fields.
Whenever a line contains exactly 4 fields (minimum sanity check for IPs), that number is cut to the value in cl to truncate it to the class B or C network "base address". Then, a counter for this (newly regenerated) base address in the array count is increased.
At end-of-file, we iterate over all indices of the array count (i.e. all unique base addresses that have occurred so far). If the associated count is larger than the threshold, we output the base address, padded to the right with .0 and with the netmask in CIDR notation appended.

108.61.0.0/16
138.68.0.0/16
148.66.0.0/16

If I understand you correctly, you want to parse a list of IPs and identify which class B or C network they belong to. If any such network appears more than 10 times, you want to print the IP along with the network it belongs to in the notation

A.B.C.D   A.B.0.0/16  n

A.B.C.D   A.B.C.0/24  n

respectively into an output file spam.lst, where n is the actual occurence count of the respective subnet.

#!/bin/awk -f

BEGIN{
    FS=OFS="."
}

NF==4{
    if (FNR==NR) {
        NF=cl
        count[$0]++
        next
    }
    for (n in count) {
        if (index($0,n)==1) {
            if (count[n]<=th) next
            printf "%s %s",$0,n
            for (i=cl;i<4;i++) printf ".0"
            printf "/%d %d\n",8*cl,count[n]
        }
    }
}

awk -v cl=2 -v th=1 -f sort.awk ips.txt ips.txt> spam.lst

Note that the input file is processed two times, and hence appears two times as argument to awk!

The program works as follows:

You specify the CIDR network class as awk variable cl as either 2 for a class B network, or 3 for a class C network.
You specify the minimum occurence count from which on you want to block the entire subnet as awk variable th.
The program sets the input and output separators to . to split input lines at the . into fields.
The script only considers lines containing exactly 4 fields (minimum sanity check for IPs)

In the first pass (FNR, the per-file line counter, is equal to NR, the global line counter), we register the subnets encountered. For every line, the field number is cut to the value in cl to truncate it to the class B or C network "base address". Then, a counter for this (newly regenerated) base address in the array count is increased, and processing skips to the next line.
In the second pass, we iterate over all indices of count (i.e. all sub-nets registered in the first pass) to see if the IP on the current line starts with that subnet address. If the associated count is larger than the threshold, we output the current IP address, then the base address, padded to the right with .0 and with the netmask in CIDR notation appended, and finally the occurence count.

108.61.115.213 108.61.0.0/16 2
108.61.199.100 108.61.0.0/16 2
138.68.224.206 138.68.0.0/16 2
138.68.235.36 138.68.0.0/16 2
148.66.129.250 148.66.0.0/16 2
148.66.130.114 148.66.0.0/16 2

Amend one-line

Source Link

edited Aug 25, 2021 at 10:32

AdminBee

23.6k
25
55
77

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} printprintf "%s %s/%d\n",$0,nw,8*cl}' ips.txt

In the beginning, the reference network base IP is split by fields into an array ref.
For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the lineIP is printed, followed by the network in CIDR notation.

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} print}' ips.txt

In the beginning, the reference network base IP is split by fields into an array ref.
For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the line is printed.

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} printf "%s %s/%d\n",$0,nw,8*cl}' ips.txt

In the beginning, the reference network base IP is split by fields into an array ref.
For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the IP is printed, followed by the network in CIDR notation.

Include original proposed solution by request of OP

Source Link

edited Aug 25, 2021 at 9:55

AdminBee

23.6k
25
55
77

The original proposition was meant to integrate into the existing script and looked as follows:

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} print}' ips.txt

Here, we would parse the list of IPs to check whether they fall into the same network as a given network base address, specified via awk variable nw.

In the beginning, the reference network base IP is split by fields into an array ref.

For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the line is printed.

The original proposition was meant to integrate into the existing script and looked as follows:

awk -v cl=2 -v nw="8.6.0.0" -F'.' 'BEGIN{split(nw,ref,/\./)} NF==4{for (i=1;i<=cl;i++) {if ($i!=ref[i]) next} print}' ips.txt

Here, we would parse the list of IPs to check whether they fall into the same network as a given network base address, specified via awk variable nw.

In the beginning, the reference network base IP is split by fields into an array ref.

For each line encountered, the program first checks if it contains 4 fields (minimum sanity check for an IP). If so, it compares the first cl fields of both the current line and the reference IP. If any of them doesn't match, the line is skipped and processing proceeds to the next line. If all relevant fields matched, the line is printed.

Complete rewrite after trying to better understand the OPs request

Source Link

edited Aug 25, 2021 at 9:45

AdminBee

23.6k
25
55
77

Loading

Source Link

answered Aug 25, 2021 at 9:12

AdminBee

23.6k
25
55
77

Loading

Stack Exchange Network

Return to Answer