I am working with a SIEM and need to be able to parse IP addresses from relatively large files. They dont have consistent fields so "cut" is not an option. I am using a modified python script to remove all characters except a-z A-Z 0-9 and period "." so that the file can be properly parsed. The issue is this does not work with my SIEM files. If I have a text file that looks like this "192.168.1.2!@#$!@%@$" it is fine, it will properly drop all of the characters I do not need, and output just the IP to a new file. The issue is, if the file looks like this "192.168.168.168@#$% this is a test" it will leave it alone after the first stage of removing abnormal characters. Please help, I have no idea why it does this. Here is my code:
#!/usr/bin/python
import re
import sys
unmodded = raw_input("Please enter the file to parse. Example: /home/aaron/ipcheck: ")
string = open(unmodded).read()
new_str = re.sub('[^a-zA-Z0-9.\n\.]', ' ', string)
open('modifiedipcheck.txt', 'w').write(new_str)
try:
file = open('modifiedipcheck.txt', "r")
ips = []
for text in file.readlines():
text = text.rstrip()
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?: [\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
outfile = open("checkips", "a")
combine = "".join(ip)
if combine is not '':
print "IP: %s" % (combine)
outfile.write(combine)
outfile.write("\n")
finally:
file.close()
outfile.close()
Anyone have any ideas? Thanks a lot in advance.
with open('bla', 'r') as fileinstead of the try/finally clause.foris indented one space too far, although it won't matter in this case, but should consider indenting it properly regardless, to prevent annoying bugs when expanding that code.