Skip to main content
18 events
when toggle format what by license comment
Jun 15, 2017 at 11:20 comment added Matthieu M. I think you can simplify the build-up of your tree by pre-sorting the CIDRs by mask; if you start putting the largest subnets first, then you never have to remove subnets when adding a larger one.
Jun 15, 2017 at 11:10 comment added Matthieu M. Naively, I would have used a more basic structure: I'm thinking about a simple list of 33 (frozen) sets, where the index n corresponds to the number of bits in the mask (/n part of the CIDR). Stashing the CIDRs in the sets is trivial enough (check out the /n part, put into appropriate set), and verifying the inclusion is also simple: start from index 32, going toward 0, mask the IP and check if it's in the set. I'm not sure it would be faster per se, but I would be more confident in the correctness.
Jun 14, 2017 at 23:15 comment added TemporalWolf Unless I'm misreading your code, your log n is 32, not n. But fair enough, you can eat the constant in big-O notation.
Jun 14, 2017 at 21:51 comment added rolfl @TemporalWolf - that's not strictly accurate. The reality is that the n in the log n is a constant, 32 (the number of bits in an IP address), not the number of CIDRs, so, being a constant the complexity is just \$O(m)\$
Jun 14, 2017 at 20:42 comment added TemporalWolf Nitpick for the explanation: A binary search tree reduces complexity to O(log n), so the end complexity would be O(m log n).
Jun 14, 2017 at 19:35 comment added rolfl Just FYI, I crunched some numbers, and it's procesing about 75,000 records per second now, or 190MB/s, which I feel is in the right sort of ball-park for your system. SSD should be a bit faster, but there are some overheads, and I expect, that if you had a concurrent IO/processing system now (computing matches in one thread while blocking on IO in another), that you could reduce the time a bit.... but it's probably not worth it. There's still room for improvement, but it will be hard work....
Jun 14, 2017 at 18:31 history edited rolfl CC BY-SA 3.0
sum the IP parts in to a single integer
S Jun 14, 2017 at 18:27 history suggested enedil CC BY-SA 3.0
code is now more Pythonic
Jun 14, 2017 at 18:15 review Suggested edits
S Jun 14, 2017 at 18:27
Jun 14, 2017 at 15:01 comment added JMK09 I've run it on the complete set of 20,344,457 rows: real 4m27.661s user 3m55.171s sys 0m26.793s That's even faster than my initial attempt of 100,000 records :)
Jun 14, 2017 at 14:58 comment added rolfl I am pleased it's working for you. Make sure you test the results carefully, and adjust the code to use best practices and other library functions in Python (library functions will likely help you with portability from IP4 to IP6 and other benefits).
Jun 14, 2017 at 14:56 comment added JMK09 WOW, many thanks. It is insanely fast, 100,000 rows now take only 1.3 seconds instead of 7 minutes: real 0m1.350s user 0m1.174s sys 0m0.146s
Jun 14, 2017 at 14:54 vote accept JMK09
Jun 14, 2017 at 14:31 comment added rolfl @JMK09 - See the edit I made to include a custom lookup for IP addresses in CIDR ranges using a much faster preprocessed algorithm. I have not tested the performance when compared to your existing code, obviously
Jun 14, 2017 at 14:29 history edited rolfl CC BY-SA 3.0
Add code example for tree.
Jun 14, 2017 at 13:01 comment added JMK09 Thanks for your extensive answer @rolfl, much appreciated! Can you point me into some direction which packages/code to use? After reading your answer I've tried implementing Pytricia but couldn't get it to work. Would you recommend for example PySubnetTree?
Jun 14, 2017 at 11:51 history edited rolfl CC BY-SA 3.0
typos
Jun 14, 2017 at 11:15 history answered rolfl CC BY-SA 3.0