Belated addition: a few days after the fact an idea hit me out of the blue while I was having a smoke, and I'm reporting it here because it can speed up things by a factor of 20 for the SPOJ limit of 50000 (much more for higher inputs). This idea is based on the observation that half of the primes p satisfy
(n / 2) < p <= n
and hence
1 <= (n / p) < 2
which is just a coy way of saying
n / p = 1
Knowing that there are k primes that satisfy this condition (bisect_right() to the rescue) I can compute the contribution for all these primes in one fell swoop as
(1 + 1)**k
This readily extends to other small quotients in the remaining range of primes, and the fun only stops when the primes get so small that their squares appear as divisors (i.e. p <= sqrt(n)).
Here's how you explain this to your computer in the pythonic manner:
def v6b (n):
"""number of divisors of n! (SPOJ DIVFACT)"""
divisor_count = 1
hi = bisect.bisect_right(primes, n)
for m in range(2, int(math.sqrt(n))):
lo = bisect.bisect_right(primes, n / m, 0, hi)
k = hi - lo
hi = lo
divisor_count = (divisor_count * m**k) % 1000000007
if k < 2:
break
# the rest is algorithm v5b but without regular modulo trimming since growth is contained
for current_prime in primes[0:hi]:
q = n
e = 0
while q >= current_prime:
q /= current_prime
e += q
divisor_count *= e + 1
return divisor_count % 1000000007
Here are the timings for the usual 500 worst-case inputs, first for the earlier version (here called 'v5b') and then for this new version. I went beyond the SPOJ limit of 50000 to give some idea of the growth behaviour.
# timing v5b ...
500 @ 500: 0.016 // cksum 230711749546
500 @ 1000: 0.016 // cksum 256088767231
500 @ 5000: 0.078 // cksum 248659270107
500 @ 10000: 0.156 // cksum 255839576411
500 @ 50000: 1.281 // cksum 252317442043
500 @ 100000: 3.313 // cksum 246710058695
500 @ 500000: 44.376 // cksum 244700225412
# timing v6b ...
500 @ 500: 0.000 // cksum 230711749546
500 @ 1000: 0.015 // cksum 256088767231
500 @ 5000: 0.016 // cksum 248659270107
500 @ 10000: 0.031 // cksum 255839576411
500 @ 50000: 0.063 // cksum 252317442043
500 @ 100000: 0.078 // cksum 246710058695
500 @ 500000: 0.266 // cksum 244700225412
Beyond 100000 or thereabouts the code gets faster with a hand-coded powering function (via repeated squaring) but the builtin exponentiation operator is faster for lower limits, and hence on SPOJ. In C++ and C# things were faster when bailing out of the first loop for k < 12 but in Python it worked best with k < 2.
Note: the time for 500 test cases at the SPOJ limit of 50000 is virtually the same as for the C# version running under Mono on my box, which is no mean feat for an interpreted language...