0

To expand on my previous question, I have a list of IP ranges in the format:

Long description:20.1.1.0-20.1.1.14
Another description:5.5.5.0-5.5.5.100
Yet another description:20.1.1.0-20.1.1.40

There are no duplicates, but I'd like to delete any overlapping IP ranges.

For example, in the above example, the first line should be deleted as its range is already included in the 3rd line.

NOTE: I need to keep the whole line (description included), not just the IP ranges.

6
  • Do the ranges have to be completely covered by another to be deleted; what if there is a partial overlap? Commented Jul 5, 2018 at 16:38
  • I think this will help with this -stackoverflow.com/questions/16986879/… Commented Jul 5, 2018 at 16:49
  • @JeffSchaller partial overlaps containing IPs not covered elsewhere should be kept. Commented Jul 5, 2018 at 17:05
  • 1
    What distro are you on? For Debian/Ubuntu - askubuntu.com/questions/878188/…. Commented Jul 5, 2018 at 17:11
  • 2
    I'd just do this in python - stackoverflow.com/questions/17206679/…. The ipaddr library has a method called overlaps. Commented Jul 5, 2018 at 17:45

1 Answer 1

1

If you are okay with the input lines being reordered, I have a relatively simple solution using GNU Awk and the "sort" command. The basic idea is to convert the IP addresses to single numbers instead of dotted pairs, which makes it very easy to compare them, and to use the -k flag of sort which allows specifying that it should only sort specific fields.

For compactness, this also uses the GNU awk feature of coprocesses, which makes it very easy to process data before and after using sort:

EDIT: The sort commandline in the original version of this answer was slightly wrong: sort -k2,3r actually treats fields 2 and 3 as a single key, to be sorted in reverse order. sort -k2,2n -k3,3rn will do the necessary thing of first sorting by field 2 and using (reversed) field 3 as a tiebreaker:

# Run as: gawk -F: -f <thisfile.awk> <input file>
BEGIN {
  # Define the sort command that we will be using later as a variable
  # Sort by
  #   - the 1st ip, smallest-to-largest
  #   - the 2nd ip, largest-to-smallest
  sort="sort -n -t: -k2,2n -k3,3nr";
}

# For every line:
{
  # Store the individual components of the addresses into 'ips'
  match($2, /([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).([[:digit:]]+)\
-([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).([[:digit:]]+)/, ips);
  # Add the components together to get the IPs as a single number.
  # The print also uses : as the delimiter between the 2 IPS for simplicity
  print $1":"ips[4]+256*(ips[3]+256*(ips[2]+256*ips[1])) \
          ":"ips[8]+256*(ips[7]+256*(ips[6]+256*ips[5])) \
    |& sort
}

# After sending all lines to sort in the appropriate format
END {
  # Close sort's input stream, so that we can read its output
  close(sort, "to");
  # Keep track of the upper end of the previous range
  prevHigh=0;
  # Read & field-split all lines from sort's output
  while((sort |& getline) > 0) {
     # One range is contained in another if its low address is >= the
     # other's (guaranteed by the sort command) and its high address is <=
     # the other's. So, we should print this record when its high address is >
     # prevHigh:
    if ($3 > prevHigh) {
      print $1":"int($2/(256*256*256))%256"."int($2/(256*256))%256"." \
                 int($2/256)%256"."$2%256 \
              "-"int($3/(256*256*256))%256"."int($3/(256*256))%256"." \
                 int($3/256)%256"."$3%256 \
      # This is now the previous range
      prevHigh = $3
    }
  }
}
4
  • Thanks. However it seems to work partially, some ranges are missing: pastebin.com/raw/jdWWeGMg Commented Jul 6, 2018 at 13:35
  • Oh, sorry! Let me take a look at it. Commented Jul 6, 2018 at 14:32
  • @MarkRoi Please see the updated answer above---the sort command has been changed, which I think fixes your issue. Commented Jul 6, 2018 at 14:56
  • Excellent. Just tested on the real data (about 10K lines) and compared the results, worked flawlessly and quickly. Thank you so much! Commented Jul 6, 2018 at 15:39

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.