I was able to shave off 20 % of time in outoftime's test by avoiding the indexed lookups in (ary[i], ary[j], ary[k]) and doing this instead:
def combinations_3(ary):
enumerated = list(enumerate(ary))
for i, x in enumerated[:-2]:
for j, y in enumerated[i+1:-1]:
for z in ary[j+1:]:
yield (x, y, z)
Precomputing the slice of the innermost loop for each j takes another 10 % off:
def combinations_3(ary):
enumerated = [(i, value, ary[i+1:]) for i, value in enumerate(ary)]
for i, x, _ in enumerated[:-2]:
for _, y, tail in enumerated[i+1:-1]:
for z in tail:
yield (x, y, z)
It is worth noting that the first version requires O(n) additional space for the list slices, and the second version ups the requirement to O(n2), where n is the size of ary.