Things should be fine if you perform strength reduction on n / num**exp. That not only makes things faster, it also gets rid of the messiness inherent in Python's exponentiation operator (so that you don't have to find out how it actually works under the hood).
In the kth iteration of the inner loop you have a quotient q_k that is equal to n / p**k (i.e. n / num**exp in your code).
To obtain n / p**(k+1) in the next iteration, all you have to do is q_k / p because that makes it (n / p**k) / p, which is equal to n / p**(k+1) as desired. In other words, instead of computing the power of the prime anew and then performing a division, simply divide the old quotient by the prime again.
Here's the complete loop, coded in a more pedestrian language:
static ushort[] primes_up_to_LIMIT = U16.SmallPrimesUpTo(50000).ToArray();
...
long divisor_count = 1;
foreach (int prime in primes)
{
if (prime > n)
break;
int e = 0;
for (int q = n; q >= prime; )
e += q /= prime;
divisor_count = (divisor_count * (e + 1)) % 1000000007;
}
return (int)divisor_count;
It isn't very fast - it clocked 0.12 s on SPOJ - but it should show the principle well enough and it can serve as the basis of a more performant solution.
Note: detailed investigation shows that for once, the lackluster performance of my C# submission on SPOJ is not due to usual problems inherent in .Net (e.g. poor input/output performance) but rather to some strange problem with Mono. On my laptop a worst-case input file clocks 13.7 ms under .Net and 136.4 ms under Mono.
Here is an attempt at rending the loop in Python (renaming some things for clarity):
divisor_count = 1
for current_prime in primes:
if current_prime > n:
break
q = n
e = 0
while q >= current_prime:
q /= current_prime
e += q
divisor_count *= e + 1
print divisor_count % 1000000007
One thing worth looking into is the underlying data type of the divisor count (pr in the original code). This can grow quickly, far beyond anything representable in a plain integer, and hence it will go big integer internally. This makes the multiplication pr *= s + 1 more expensive, and the statement has a cost multiplier of max(T) * pi(max(N)) = 500 * 5133.
Reducing the division count modulo 1000000007 after every update by changing
divisor_count *= e + 1
to
divisor_count = (divisor_count * (e + 1)) % 1000000007
improved the time for a worst-case input file from 1.92 s to 1.39 s on my box, so it's something worth keeping in mind. For comparison: the original code clocked 2.64 s for this file.