Don't bother parallelizing just yet. Right now, you're taking no advantage of numpy vectorization; you may as well be using Python list (or maybe array.array) for all the benefit numpy is giving you.
Actually use the vectorization features, and the overhead should drop by several orders of magnitude:
import numpy as np
A = np.array([1,2,3,4..........100000]) # If this is actually the values you want, use np.arange(1, 100000+1) to speed it up
A = (A * 6) ** (1 / 3)
# If the result should truncate back to int64, not convert to doubles, cast back at the end
A = A.astype(np.int64)
(A * 6) ** (1 / 3) does the same work as the for loop did, but much faster (you could match the original code more closely with A = (A * 2 + A * 4) ** (1/3), but multiplying by 2 and 4 separately and adding them together is pointless when you could just multiply by 6 directly). The final (optional, depending on intent) line gets exact equivalent behavior of the original loop by truncating back to the original integer dtype.
Comparing performance with ipython %%timeit magic for a microbenchmark:
In [2]: %%timeit
...: A = np.arange(1, 100000+1)
...: for i in range(len(A)):
...: A[i] = (A[i]*2 + A[i]*4) ** (1/3)
...:
427 ms ± 6.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %%timeit
...: A = np.arange(1, 100000+1)
...: A = (A * 6) ** (1/3)
...:
2.72 ms ± 51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The vectorized code takes about 0.6% of the time taken by the naive loop; merely parallelizing the naive loop would never come close to achieving that sort of speedup. Adding the .astype(np.int64) cast only increases runtime by about 6%, still a trivial fraction of what the original for loop required.
np.arange(1, 100000+1)(much more quickly) if you like.