If grep already does what you want, then just use it with the subprocess module;
rv = subprocess.check_call(['grep', first, filename])
if rv is 0:
rv = subprocess.check_call(['grep', second, filename])
if rv is 0:
print 'Found', second
elif rv is 1:
print second, 'not found'
else:
print 'Error looking for', second
elif rv is 1:
print first, 'not found'
else:
print 'Error looking for', first
Edit: (in response to comments)
Remember that the most efficient code is the one you don't have to write, since programmer time is much more expensive than computer time. Also keep in mind that grep has been well optimized. One of the big wins was not splitting the input into lines. Which your Python solution is doing twice.
At the very least you should keep the list of lines and re-use it.
But if you want to do it in Python, there are some things that you can copy from GNU grep;
- Use page-aligned, page-sized blocks of memory
- Avoid splitting the data into lines
- Use
mmap
Edit 2:
As Spacedman correctly mentions, there is no substitute for testing. So let's do that.
I ran some tests on a 40 MB test file, searching for the non-existant string abracadabra. First, using BSD grep;
> time grep abracadabra procmail.log
0.242u 0.015s 0:00.25 100.0% 58+179k 0+0io 0pf+0w
> time grep abracadabra procmail.log
0.192u 0.016s 0:00.20 100.0% 57+176k 0+0io 0pf+0w
> time grep abracadabra procmail.log
0.184u 0.023s 0:00.20 100.0% 59+183k 0+0io 0pf+0w
> time grep abracadabra procmail.log
0.199u 0.007s 0:00.20 95.0% 60+186k 0+0io 0pf+0w
> time grep abracadabra procmail.log
0.184u 0.023s 0:00.20 100.0% 59+183k 0+0io 0pf+0w
> time grep abracadabra procmail.log
0.184u 0.024s 0:00.20 100.0% 57+176k 0+0io 0pf+0w
Then with the following program:
import mmap
with open('procmail.log', 'r+b') as p:
mm = mmap.mmap(p.fileno(), 0)
rv = mm.find('abracadabra')
print rv
This gave:
> time python foo.py
-1
0.139u 0.024s 0:00.16 93.7% 1701+549k 0+1io 0pf+0w
> time python foo.py
-1
0.094u 0.039s 0:00.13 92.3% 1807+583k 0+1io 0pf+0w
> time python foo.py
-1
0.109u 0.023s 0:00.13 92.3% 1807+583k 0+1io 0pf+0w
> time python foo.py
-1
0.117u 0.015s 0:00.13 92.3% 1807+583k 0+1io 0pf+0w
> time python foo.py
-1
0.125u 0.007s 0:00.13 92.3% 1807+583k 0+1io 0pf+0w
So;
- on my machine
- searching for a simple string
using mmap in Python is slightly faster than calling BSD grep.
Keep in mind that the mmap objects (at least in python 2.7) do not support searching for regular expressions. And results may differ with file size, size of the RAM, operating system et cetera.
'P QRST'while also looking forabracadabraisn't an arduous task.