0

I have the below text file that I would need some help with parsing out IP addresses.

The text file is of the form

abc 10.1.1.1/32   aabbcc
def 11.2.0.0/16   eeffgg
efg 0.0.0.0/0   ddeeff

In other words, a bunch of IP networks exist as part of a log file. The output should be provided as below:

10.1.1.1/32
11.2.0.0/16
0.0.0.0/0

I have the below code but does not output the required information

file = open(filename, 'r')
for eachline in file.readlines():
    ip_regex = re.findall(r'(?:\d{1,3}\.){3}\d{1,3}', eachline)
    print ip_regex
6
  • Try to describe what does each line of code and you will find the error. see re documentation too. Commented Oct 14, 2014 at 21:08
  • Well, you didn't include anything in your regex to match the /32 or similar at the end, so of course it's only going to match the 10.1.1.1 or similar. Commented Oct 14, 2014 at 21:10
  • re.findall("\d+\.\d+\.\d+\.\d+\/\d+",file.read()), you should also use with to open your files Commented Oct 14, 2014 at 21:15
  • As a side note, there is no reason to use readlines() there. file is already an iterable of lines. All you're doing is wastefully forcing Python to read and parse the entire file in memory before you can use it. Commented Oct 14, 2014 at 21:16
  • As another side note, those aren't IP addresses, those are IP networks, which contain an address and a bitmask. In fact, your existing code is already finding the IP addresses that are part of those networks… Commented Oct 14, 2014 at 21:20

2 Answers 2

6

First, your regex doesn't even attempt to capture anything but four dotted numbers, so of course it's not going to match anything else, like a /32 on the end. if you just add, e.g., /\d{1,2} to the end, it'll fix that:

(?:\d{1,3}\.){3}\d{1,3}/\d{1,2}

Regular expression visualization

Debuggex Demo


However, if you don't understand regular expressions well enough to understand that, you probably shouldn't be using a regex as a piece of "magic" that you'll never be able to debug or extend. It's a bit more verbose with str methods like split or find, but maybe easier to understand for a novice:

for line in file:
    for part in line.split()
        try:
            address, network = part.split('/')
            a, b, c, d = address.split('.')
        except ValueError:
            pass # not in the right format
        else:
            # do something with part, or address and network, or whatever

As a side note, depending on what you're actually doing with these things, you might want to use the ipaddress module (or the backport on PyPI for 2.6-3.2) rather than string parsing:

>>> import ipaddress
>>> s = '10.1.1.1/32'
>>> a = ipaddress.ip_network('10.1.1.1/32')

You can combine that with either of the above:

for line in file:
    for part in line.split():
        try:
            a = ipaddress.ip_network(part)
        except ValueError:
            pass # not the right format
        else:
            # do something with a and its nifty methods
Sign up to request clarification or add additional context in comments.

3 Comments

this website Debuggex that @abernert linked to, is the best website for regex i have ever seen.
@TehTris: Yeah, I do love it. But notice that once they're out of beta, they're apparently going to start charging for non-JS regexes. They already started charging for the convert-to-plain-English feature (which they then disabled…). Very clever; I'm not sure I could go back to… whatever I used to use, which I can't even remember anymore. :)
ipaddress does not work for adresses like this "010.200.074.104". To parse this, it is better to use a one-liner like this: ".".join([str(int(x)) for x in ipv4_str.split(".")])
1

In this particular case, a regex might be overkill, you could use split

with open(filename) as f:
    ipList = [line.split()[1] for line in f]

This should produce a list of strings, which are the ip addresses.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.