You're not utilizing the full power of modern computers, that have multiple central processing units! This is by far the best optimization you can have here, since this is CPU bound. Note: for I/O bound operations multithreading (using the threading module) is suitable.
So let's see how python makes it easy to do so using multiprocessing module (read comments):
import hashlib
# you're sampling a string so you need sample, not 'choice'
from random import sample
import multiprocessing
# use a thread to synchronize writing to file
import threading
# open up to 4 processes per cpu
processes_per_cpu = 4
processes = processes_per_cpu * multiprocessing.cpu_count()
print "will use %d processes" % processes
longitud = 8
valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# check on smaller ranges to compare before trying your range... :-)
RANGE = 200000
def enc(string):
m = hashlib.md5()
m.update(string.encode('utf-8'))
return m.hexdigest()
# we synchronize the results to be written using a queue shared by processes
q = multiprocessing.Manager().Queue()
# this is the single point where results are written to the file
# the file is opened ONCE (you open it on every iteration, that's bad)
def write_results():
with open('datos.txt', 'w') as f:
while True:
msg = q.get()
if msg == 'close':
break;
else:
f.write(msg)
# this is the function each process uses to calculate a single result
def calc_one(i):
s = ''.join(sample(valores, longitud))
md = enc(s)
q.put("%s %s\n" % (s, md))
# we start a process pool of workers to spread work and not rely on
# a single cpu
pool = multiprocessing.Pool(processes=processes)
# this is the thread that will write the results coming from
# other processes using the queue, so it's execution target is write_results
t = threading.Thread(target=write_results)
t.start()
# we use 'map_async' to not block ourselves, this is redundant here,
# but it's best practice to use this when you don't HAVE to block ('pool.map')
pool.map_async(calc_one, xrange(RANGE))
# wait for completion
pool.close()
pool.join()
# tell result-writing thread to stop
q.put('close')
t.join()
There are probably more optimizations to be done in this code, but a major optimization for any cpu-bound task like you present, is using multiprocessing.
note: A trivial optimization of file writes would be to aggregate some results from the queue and write them together (if you have many cpus that exceed the single writing thread's speed)
note 2: Since OP was looking to go over combinations/permutations of stuff, it should be noted that there is a module for doing just that, and it's called itertools.
i = 1assignment superfluous?