2

I am working with a SIEM and need to be able to parse IP addresses from relatively large files. They dont have consistent fields so "cut" is not an option. I am using a modified python script to remove all characters except a-z A-Z 0-9 and period "." so that the file can be properly parsed. The issue is this does not work with my SIEM files. If I have a text file that looks like this "192.168.1.2!@#$!@%@$" it is fine, it will properly drop all of the characters I do not need, and output just the IP to a new file. The issue is, if the file looks like this "192.168.168.168@#$% this is a test" it will leave it alone after the first stage of removing abnormal characters. Please help, I have no idea why it does this. Here is my code:

    #!/usr/bin/python
    import re
    import sys

    unmodded = raw_input("Please enter the file to parse. Example: /home/aaron/ipcheck: ")
    string = open(unmodded).read()
    new_str = re.sub('[^a-zA-Z0-9.\n\.]', ' ', string)
    open('modifiedipcheck.txt', 'w').write(new_str)

    try:
        file = open('modifiedipcheck.txt', "r")
        ips = []
        for text in file.readlines():
            text = text.rstrip()
            regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:    [\d]{1,3})$',text)
            if regex is not None and regex not in ips:
                ips.append(regex)
         for ip in ips:
            outfile = open("checkips", "a")
            combine = "".join(ip)
            if combine is not '':
                print "IP: %s" % (combine)
                outfile.write(combine)
                outfile.write("\n")
     finally:
            file.close()
            outfile.close()

Anyone have any ideas? Thanks a lot in advance.

3
  • 1
    Also, you may want to consider using with open('bla', 'r') as file instead of the try/finally clause. Commented Feb 28, 2013 at 9:14
  • And the second for is indented one space too far, although it won't matter in this case, but should consider indenting it properly regardless, to prevent annoying bugs when expanding that code. Commented Feb 28, 2013 at 9:15
  • Hey sorry, its only indented incorrectly here from the copying and paste, in my editor its good :) Commented Feb 28, 2013 at 9:21

2 Answers 2

3

Your regex ends with $, which indicates that it expects the line to end at that point. If you remove that, it should work fine:

regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)

You can also simplify the regex itself further:

regex = re.findall(r'(?:\d{1,3}\.){3}\d{1,3}', text)
Sign up to request clarification or add additional context in comments.

3 Comments

That works, but after that it still stops after the first IP. So if I have "192.168.1.1!@#!@# this is a test 192.168.23.23" it will only output the first IP, and ignore the second. I'm sorry, its late so I am likely missing the little things. Appreciate the help so far though.
NVM it is working correctly for the most part it is just stringing everything together without space in the output file.
In case you didn't fix it yet, you can use "\n".join(ip) instead of "".join(ip) so it introduces line breaks between the IPs. Then you can just write the string into a file without any further considerations.
1

Here is what I think is happening. You have a pattern that looks for garbage characters and replaces them with a space. When you have an IP address followed by nothing but garbage, the garbage is turned to spaces, and then when you strip the string the spaces are gone, leaving nothing but the address you want to match.

Your pattern ends in a $ so it is anchored to the end of the line, so when the address is the last thing on the line, it matches.

When you have this is a test then there are non-garbage characters that are left alone, strip doesn't remove them, then the $ means that the IP address doesn't match.

1 Comment

Hey, yes this was the issue. Removal of the $ character fixed it except it still stops after pulling the first IP.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.