I have written a Numerical integration library in Python from scratch using Newton-Cotes formulas and integer arithmetic, as a self-imposed programming challenge. I wrote all the code completely by myself.
I implemented all the code using formulas found here and there.
Now what is a Newton-Cotes formula? It is something like this:
$$\int_a^b f(x)dx \approx \frac{A}{B}\frac{b - a}{n}(W_0f(a) + W_1f(a+\frac{b-a}{n}) + W_2f(a+\frac{2(b-a)}{n})+W_3f(a+\frac{3(b-a)}{n}))+W_4f(a+\frac{4(b-a)}{n})+...+W_4f(a+\frac{(n-4)(b-a)}{n})+W_3f(a+\frac{(n-3)(b-a)}{n})+W_2f(a+\frac{(n-2)(b-a)}{n})+W_1f(a+\frac{(n-1)(b-a)}{n})+W_0f(b))$$
The notation seems complicated, but actually it is quite simple. b-a is the difference between them, obviously, and it is also the length of the interval. A and B are just a pair integers, n is some integer. Here we split the interval [a, b] into n evenly spaced subintervals, \$\frac{b-a}{n}\$ is the length of the subintervals. With n intervals there are n + 1 boundary points. \$W_i\$ is just some integer, the formula just tells us to evaluate the function at these boundary points, and multiply the function value with the corresponding weight, add them together, and then multiply the sum with the two fractions at the front. And the result is approximately the integral.
That is all there is to it, really. Importantly the weights are symmetric around the middle (\$\frac{a + b}{2}\$), each weight will apply to exactly two positions except the middle position, the middle position is a boundary point if n is even and no other point shares its weight.
I represent all these numbers as fractions, stored as a pair of integers. How do I add fractions? It is just basic algebra, the universal formula that works for all cases without needing conditionals is \$\frac{a}{b} + \frac{c}{d} = \frac{ad + bc}{bd}\$, and that is what I use.
And the difference is \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{bd}\$, what is 0.5 of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{2bd}\$. What is twice of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{2(bc - ad)}{bd}\$. What is thrice of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{3(bc - ad)}{bd}\$ What is \$0.\overline{3}\$ of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{3bd}\$...
You get the idea. To multiply a fraction by an integer just multiply the numerator with the integer, to divide a fraction by an integer just multiply the denominator with the integer. Notice that the numbers (bc - ad) and bd don't change, so we don't need to recalculate them, we store them as variables and reuse them to calculate the multiples.
Represent \$a\$ as \$\frac{a_n}{a_d}\$ and \$b\$ as \$\frac{b_n}{b_d}\$, then an interior point is \$\frac{ya_nb_d + x(b_na_d - a_nb_d)}{ya_db_d}\$, this interior point is exactly \$\frac{x}{y}\$ the way there from a to b, in other words it is \$a + \frac{x(b - a)}{y}\$.
Now because I don't use GCD (which will slow down the code) and I just naively use cross multiplication, the numbers will explode quickly which will also slow down the code, I must make the interior positions reduced fractions. Numerators and denominators of the reduced fractions repeat, so I store precomputed values in lists and use indexing to combine with with the corresponding weight.
I have done a number of optimizations. because multiplication is slower compared to addition and some multiples are sums of previous multiples, I have made it so the code to combine two of such previous multiples by addition to get the current multiple (but not above, one multiplication beats addition and indexing chain), and I have made the code to detect arithmetic progression automatically, and I have made the code to use x << n.bit_length() - 1 instead of x * n if n is a power of two (n & -n == n), except for n==1 case I use x + x because that beats left shift by one.
Oh, did I mention that I wrote functions that generates the integration functions using some other functions and parts of functions stored in string templates and the str.format function...? Yes indeed, I wrote code that programmatically generates Python functions using formulas stored as dicts and I use exec(compile(source, "<dynamic>", "exec"), NAMESPACE) to generate the Python functions.
And I have implemented integration by parts. It is simple, split the interval into n subintervals of equal length each, integrate each and then sum. I chose the powers of two as the count of subintervals. I use the term level to denote the count of subintervals, the count of subintervals is 1 << level, for example if level is 5 there are \$2^5=32\$ subintervals.
Now what are the dyadic fractions?
$$\left\{ \begin{aligned} \frac{0}{1}, \ \frac{1}{64}, \ \frac{1}{32}, \ \frac{3}{64}, \ \frac{1}{16}, \ \frac{5}{64}, \ \frac{3}{32}, \ \frac{7}{64}, \ \frac{1}{8}, \ \frac{9}{64}, \ \frac{5}{32}, \ \frac{11}{64}, \ \frac{3}{16} \\ \frac{13}{64}, \ \frac{7}{32}, \ \frac{15}{64}, \ \frac{1}{4}, \ \frac{17}{64}, \ \frac{9}{32}, \ \frac{19}{64}, \ \frac{5}{16}, \ \frac{21}{64}, \ \frac{11}{32}, \ \frac{23}{64}, \ \frac{3}{8}, \ \frac{25}{64} \\ \frac{13}{32}, \ \frac{27}{64}, \ \frac{7}{16}, \ \frac{29}{64}, \ \frac{15}{32}, \ \frac{31}{64}, \ \frac{1}{2}, \ \frac{33}{64}, \ \frac{17}{32}, \ \frac{35}{64}, \ \frac{9}{16}, \ \frac{37}{64}, \ \frac{19}{32} \\ \frac{39}{64}, \ \frac{5}{8}, \ \frac{41}{64}, \ \frac{21}{32}, \ \frac{43}{64}, \ \frac{11}{16}, \ \frac{45}{64}, \ \frac{23}{32}, \ \frac{47}{64}, \ \frac{3}{4}, \ \frac{49}{64}, \ \frac{25}{32}, \ \frac{51}{64} \\ \frac{13}{16}, \ \frac{53}{64}, \ \frac{27}{32}, \ \frac{55}{64}, \ \frac{7}{8}, \ \frac{57}{64}, \ \frac{29}{32}, \ \frac{59}{64}, \ \frac{15}{16}, \ \frac{61}{64}, \ \frac{31}{32}, \ \frac{63}{64}, \ \frac{1}{1} \\ \end{aligned} \right\} $$
These are all the reduced fractions between 0 and 1 with \$2^{-6}\$ spacing, notice that all the numerators are odd except 0. Because the odds are all coprime to two and thus powers of two. For 2 there are only two possible modulus remainders: 0 and 1. If the remainder is 0 then the number is a multiple of 2, if the remainder is 1 then it is odd (2-coprime). Thus odds are 2x+1 or 2x-1. Notice 64 appears in the denominator every other term, and 32 appears in the denominator once every four times, and 16 appears in the denominator once every eight times and so on, these are exactly the powers of two, their first appearances are also at powers of two indices, and the numerators are of course the odd numbers below the denominator for each denominator. Each halving of the denominator corresponds to doubling of the step size, and every denominator first appears at index equal to half the step size.
And here is my code:
from itertools import islice
from math import ceil, gcd
from gmpy2 import atan2, get_context, mpfr, log10
NEWTON_COTES_FORMULAS = [
{
"steps": 3,
"scale": (3, 8),
"extrema": 1,
"middle": {3: [(1, 3), (2, 3)]},
},
{
"steps": 4,
"scale": (2, 45),
"extrema": 7,
"middle": {32: [(1, 4), (3, 4)], 12: [(1, 2)]},
},
{
"steps": 5,
"scale": (5, 288),
"extrema": 19,
"middle": {75: [(1, 5), (4, 5)], 50: [(2, 5), (3, 5)]},
},
{
"steps": 6,
"scale": (1, 140),
"extrema": 41,
"middle": {216: [(1, 6), (5, 6)], 27: [(1, 3), (2, 3)], 272: [(1, 2)]},
},
{
"steps": 7,
"scale": (7, 17280),
"extrema": 751,
"middle": {
3577: [(1, 7), (6, 7)],
1323: [(2, 7), (5, 7)],
2989: [(3, 7), (4, 7)],
},
},
{
"steps": 8,
"scale": (4, 14175),
"extrema": 989,
"middle": {
5888: [(1, 8), (7, 8)],
-928: [(1, 4), (3, 4)],
10496: [(3, 8), (5, 8)],
-4540: [(1, 2)],
},
},
{
"steps": 9,
"scale": (9, 89600),
"extrema": 2857,
"middle": {
15741: [(1, 9), (8, 9)],
1080: [(2, 9), (7, 9)],
19344: [(1, 3), (2, 3)],
5778: [(4, 9), (5, 9)],
},
},
{
"steps": 10,
"scale": (5, 299376),
"extrema": 16067,
"middle": {
106300: [(1, 10), (9, 10)],
-48525: [(1, 5), (4, 5)],
272400: [(3, 10), (7, 10)],
-260550: [(2, 5), (3, 5)],
427368: [(1, 2)],
},
},
]
def generate_indices(formula: dict) -> dict:
num_mults = set()
den_mults = set()
middle = formula["middle"]
for points in middle.values():
num, den = points[0]
num_mults.add(num)
den_mults.add(den)
if len(points) > 1:
num, den = points[1]
num_mults.add(num)
den_mults.add(den)
num_mults = sorted(num_mults)
den_mults = sorted(den_mults)
num_indices = {e: i for i, e in enumerate(num_mults)}
if len(den_mults) == 1 and den_mults[0] > 2:
return {
"num_mults": num_mults,
"den_mults": den_mults,
"indices": [
(weight, num_indices[a], num_indices[b])
for weight, ((a, _), (b, _)) in middle.items()
],
"middle": None,
}
den_indices = {e: i for i, e in enumerate(den_mults)}
indices = [
(weight, [(num_indices[num], den_indices[den]) for num, den in points])
for weight, points in middle.items()
]
weight = None
if den_mults[0] == 2:
weight, _ = indices.pop(-1)
return {
"num_mults": num_mults,
"den_mults": den_mults,
"indices": indices,
"middle": weight,
}
def assemble_strategy(mults: list[int]) -> dict:
strategy = {"add": [], "mul": []}
if (e := mults[0]) == 1:
diffs = [b - a for a, b in zip(mults, mults[1:])]
if len(set(diffs)) == 1:
return {"type": "range", "step": diffs[0], "increments": len(diffs)}
else:
strategy["mul"].append((0, e))
l = len(mults)
for i, e in enumerate(mults[1:], start=1):
found = False
for j, f in enumerate(mults[:i]):
for k, g in enumerate(mults[j:i], start=j):
if f + g == e:
strategy["add"].append((i, j, k))
found = True
break
if found:
break
else:
strategy["mul"].append((i, e))
return {"type": "general", "length": len(mults), "strategy": strategy}
FUNCTION_TEMPLATE = """def {func_name}(
low_num: int, low_den: int, high_num: int, high_den: int, integrand: callable
) -> tuple[int, int]:
term_1 = low_num * high_den
term_2 = high_num * low_den - term_1
term_3 = low_den * high_den
{assemble}
result_num, result_den = integrand(low_num, low_den)
term_num, term_den = integrand(high_num, high_den)
result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
result_num {extrema_weight}
indices = {indices_dict}["{func_name}"]{loop_body}{extra_iteration}
return result_num * term_2 {num_scale}, result_den * {last_den} {den_scale}
"""
FUNCTION_MAINLOOP = """
for weight, ((x1, y), (x2, _)) in indices:
term_num, term_den = integrand((start := mults_1[y]) + mults_2[x1], (den := mults_3[y]))
next_num, next_den = integrand(start + mults_2[x2], den)
term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
term_num *= weight
result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""
FUNCTION_ALTLOOP = """
for weight, x1, x2 in indices:
term_num, term_den = integrand(term_4 + mults_2[x1], term_5)
next_num, next_den = integrand(term_4 + mults_2[x2], term_5)
term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
term_num *= weight
result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""
FUNCTION_LAST_ITERATION = """
term_num, term_den = integrand(mults_1[0] + term_2, mults_3[0])
term_num {weight}
result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""
def mul_or_lshift(n: int) -> str:
return f"* {n}" if n & -n != n else f"<< {n.bit_length() - 1}"
def imul_or_ilshift(n: int) -> str:
return f"*= {n}" if n & -n != n else f"<<= {n.bit_length() - 1}"
def assemble_numerator_multiples(strategy: dict) -> list:
if strategy["type"] == "range":
return [
(
" step = term_2"
if (step := strategy["step"]) == 1
else (
" step = term_2 + term_2"
if step == 2
else f" step = term_2 {mul_or_lshift(step)}"
)
),
" number = term_2",
f" mults_2 = [term_2] * {strategy['increments']+1}\n for i in range(1, {strategy['increments']+1}):\n mults_2[i] = number = number + step\n",
]
result = [f" mults_2 = [term_2] * {strategy['length']}"]
strats = strategy["strategy"]
for a, b in strats["mul"]:
result.append(f" mults_2[{a}] {imul_or_ilshift(b)}")
for i, j, k in strats["add"]:
if j != k:
result.append(
f" mults_2[{i}] = "
+ (f"mults_2[{j}]" if j else "term_2")
+ " + "
+ (f"mults_2[{k}]" if k else "term_2")
)
else:
if not j:
result.append(f" mults_2[{i}] += term_2")
else:
result.append(f" mults_2[{i}] = (val := mults_2[{j}]) + val")
return result
def assemble_denominator_multiples(strategy: dict) -> list:
strats = strategy["strategy"]
if strategy["length"] == 1:
mult = strats["mul"][0][1]
return [
f" term_4 = term_1 {(op := mul_or_lshift(mult))}\n term_5 = term_3 {op}"
]
result = [
f" mults_1 = [term_1] * {(length := strategy['length'])}",
f" mults_3 = [term_3] * {length}",
]
for a, b in strats["mul"]:
result.append(
f" mults_1[{a}] {(op := imul_or_ilshift(b))}\n mults_3[{a}] {op}"
)
for i, j, k in strats["add"]:
if j != k:
result.extend(
(
f" mults_1[{i}] = mults_1[{j}] + mults_1[{k}]",
f" mults_3[{i}] = mults_3[{j}] + mults_3[{k}]",
)
)
else:
result.extend(
(
f" mults_1[{i}] = (val := mults_1[{j}]) + val",
f" mults_3[{i}] = (val := mults_3[{j}]) + val",
)
)
return result
NAMESPACE = {"NEWTON_COTES_MIDDLE_INDICES": {}, "ROMBERG_WEIGHTS": {}}
def generate_source_Newton(level: int) -> str:
assert isinstance(level, int) and 0 <= level <= 7
fname = f"Newton_Cotes_integrate_{level+3}"
formula = NEWTON_COTES_FORMULAS[level]
data = generate_indices(formula)
NAMESPACE["NEWTON_COTES_MIDDLE_INDICES"][fname] = data["indices"]
den_mults = data["den_mults"]
num_scale, den_scale = formula["scale"]
extrema = formula["extrema"]
l = len(den_mults)
return FUNCTION_TEMPLATE.format(
func_name=fname,
assemble="\n".join(
assemble_numerator_multiples(assemble_strategy(data["num_mults"]))
+ assemble_denominator_multiples(assemble_strategy(den_mults))
),
extrema_weight=imul_or_ilshift(extrema),
indices_dict="NEWTON_COTES_MIDDLE_INDICES",
loop_body=FUNCTION_MAINLOOP if l > 1 else FUNCTION_ALTLOOP,
extra_iteration=(
FUNCTION_LAST_ITERATION.format(weight=imul_or_ilshift(middle))
if (middle := data["middle"])
else ""
),
num_scale=mul_or_lshift(num_scale),
den_scale=mul_or_lshift(den_scale),
last_den="term_5" if l == 1 else "mults_3[-1]",
)
FUNCTION_SOURCES = {}
for i in range(8):
FUNCTION_SOURCES[f"Newton_Cotes_integrate_{i+3}"] = source = generate_source_Newton(
i
)
exec(compile(source, "<dynamic>", "exec"), NAMESPACE)
def integrate_by_parts(
low_num: int,
low_den: int,
high_num: int,
high_den: int,
level: int,
integrand: callable,
integrator: callable,
reduce: bool = False,
) -> tuple[int, int]:
start = low_num * high_den
step_num, den = high_num * low_den - start, low_den * high_den
total = 1 << level
points = [(low_num, low_den)] * (total + 1)
points[-1] = (high_num, high_den)
level_up = level + 1
multiple = start
starts = [start] * level_up
for i in range(1, level_up):
starts[i] = multiple = multiple + multiple
multiple = den
denominators = [den] * level_up
for i in range(1, level_up):
denominators[i] = multiple = multiple + multiple
step = step_num + step_num
multiple = step_num
lim = total >> 1
odds = [step_num] * lim
for i in range(1, lim):
odds[i] = multiple = multiple + step
power = 1
powers = [1] * level_up
for i in range(1, level_up):
powers[i] = power = power + power
for index, start, step in zip(range(level, 0, -1), powers, powers[1:]):
initial = starts[index]
denominator = denominators[index]
for i, odd in zip(range(start, total, step), odds):
points[i] = (initial + odd, denominator)
del odds[lim:]
lim >>= 1
result_num, result_den = 0, 1
for (num1, den1), (num2, den2) in zip(points, points[1:]):
term_num, term_den = integrator(num1, den1, num2, den2, integrand)
result_num, result_den = (
result_num * term_den + term_num * result_den,
result_den * term_den,
)
if reduce:
common = gcd(result_num, result_den)
result_num //= common
result_den //= common
return result_num, result_den
def arctan_derivative(x: int, y: int) -> tuple[int, int]:
return y * y, x * x + y * y
def arctan_Newton(x: int, y: int, order: int) -> tuple[int, int]:
assert 3 <= order <= 10
return NAMESPACE[f"Newton_Cotes_integrate_{order}"](0, 1, x, y, arctan_derivative)
def arctan_parts(x: int, y: int, order: int, level: int) -> tuple[int, int]:
assert 3 <= order <= 10
return integrate_by_parts(0, 1, x, y, level, arctan_derivative, NAMESPACE[f"Newton_Cotes_integrate_{order}"])
def get_accuracy_fraction_levels(
steps: int, max_level: int, func: callable, digits: int = 100
):
get_context().precision = int(ceil(digits / log10(2)))
levels = [[] for _ in range(max_level)]
max_prec = mpfr(0.1) ** digits
for i in range(1, steps + 1):
atan = atan2(i, steps)
for j, k in zip(range(max_level), range(1, max_level + 1)):
num, den = func(i, steps, k)
levels[j].append(
(i / steps, int(-log10(abs(mpfr(num) / den - atan) or max_prec)))
)
return levels
def get_accuracy_fraction(steps: int, func: callable, digits: int = 100):
get_context().precision = int(ceil(digits / log10(2)))
result = []
max_prec = mpfr(0.1) ** digits
for i in range(1, steps + 1):
num, den = func(i, steps)
result.append(
(
i / steps,
int(
-log10(
abs(mpfr(num) / den - atan2(i, steps)) or max_prec
)
),
)
)
return result
TESTS = [
("arctan_Newton_3", lambda x, y: arctan_Newton(x, y, 3), get_accuracy_fraction, 0),
("arctan_Newton_4", lambda x, y: arctan_Newton(x, y, 4), get_accuracy_fraction, 0),
("arctan_Newton_5", lambda x, y: arctan_Newton(x, y, 5), get_accuracy_fraction, 0),
("arctan_Newton_6", lambda x, y: arctan_Newton(x, y, 6), get_accuracy_fraction, 0),
("arctan_Newton_7", lambda x, y: arctan_Newton(x, y, 7), get_accuracy_fraction, 0),
("arctan_Newton_8", lambda x, y: arctan_Newton(x, y, 8), get_accuracy_fraction, 0),
("arctan_Newton_9", lambda x, y: arctan_Newton(x, y, 9), get_accuracy_fraction, 0),
("arctan_Newton_10", lambda x, y: arctan_Newton(x, y, 10), get_accuracy_fraction, 0),
("arctan_Newton_3_parts", lambda x, y, level: arctan_parts(x, y, 3, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_4_parts", lambda x, y, level: arctan_parts(x, y, 4, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_5_parts", lambda x, y, level: arctan_parts(x, y, 5, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_6_parts", lambda x, y, level: arctan_parts(x, y, 6, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_7_parts", lambda x, y, level: arctan_parts(x, y, 7, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_8_parts", lambda x, y, level: arctan_parts(x, y, 8, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_9_parts", lambda x, y, level: arctan_parts(x, y, 9, level), get_accuracy_fraction_levels, 1),
("arctan_Newton_10_parts", lambda x, y, level: arctan_parts(x, y, 10, level), get_accuracy_fraction_levels, 1),
]
def get_data(steps: int, max_level: int, digits: int = 100) -> dict:
result = {}
for func_name, func, test_func, nested in TESTS:
if nested:
for i, line in enumerate(
test_func(steps, max_level, func, digits), start=1
):
result[f"{func_name}_level_{i}"] = line
else:
result[func_name] = test_func(steps, func, digits)
return result
data = get_data(512, 4)
mean_digits = []
weighted_mean_digits = []
harmonic_mean_digits = []
quadratic_mean_digits = []
for k, v in data.items():
s1 = s2 = s3 = s4 = 0
for a, b in v:
s1 += b
s2 += float(a * b)
s3 += 1 / b
s4 += b * b
mean_digits.append((k, s1 / 512))
weighted_mean_digits.append((k, s2 / 256.5))
harmonic_mean_digits.append((k, 512 / s3))
quadratic_mean_digits.append((k, (s4 / 512) ** 0.5))
mean_digits.sort(key=lambda x: -x[1])
weighted_mean_digits.sort(key=lambda x: -x[1])
harmonic_mean_digits.sort(key=lambda x: -x[1])
quadratic_mean_digits.sort(key=lambda x: -x[1])
Test results:
In [569]: mean_digits
Out[569]:
[('arctan_Newton_10_parts_level_4', 28.48828125),
('arctan_Newton_10_parts_level_3', 24.76953125),
('arctan_Newton_9_parts_level_4', 24.080078125),
('arctan_Newton_8_parts_level_4', 23.916015625),
('arctan_Newton_10_parts_level_2', 21.115234375),
('arctan_Newton_9_parts_level_3', 21.0703125),
('arctan_Newton_8_parts_level_3', 20.904296875),
('arctan_Newton_7_parts_level_4', 19.55859375),
('arctan_Newton_6_parts_level_4', 19.33203125),
('arctan_Newton_9_parts_level_2', 18.09375),
('arctan_Newton_8_parts_level_2', 17.876953125),
('arctan_Newton_7_parts_level_3', 17.189453125),
('arctan_Newton_6_parts_level_3', 16.943359375),
('arctan_Newton_10_parts_level_1', 16.78125),
('arctan_Newton_5_parts_level_4', 14.904296875),
('arctan_Newton_7_parts_level_2', 14.71484375),
('arctan_Newton_4_parts_level_4', 14.634765625),
('arctan_Newton_6_parts_level_2', 14.537109375),
('arctan_Newton_9_parts_level_1', 14.501953125),
('arctan_Newton_8_parts_level_1', 14.310546875),
('arctan_Newton_5_parts_level_3', 13.087890625),
('arctan_Newton_10', 12.9296875),
('arctan_Newton_4_parts_level_3', 12.859375),
('arctan_Newton_7_parts_level_1', 11.978515625),
('arctan_Newton_6_parts_level_1', 11.765625),
('arctan_Newton_5_parts_level_2', 11.30078125),
('arctan_Newton_9', 11.1171875),
('arctan_Newton_4_parts_level_2', 11.017578125),
('arctan_Newton_8', 10.921875),
('arctan_Newton_3_parts_level_4', 10.080078125),
('arctan_Newton_5_parts_level_1', 9.408203125),
('arctan_Newton_4_parts_level_1', 9.21484375),
('arctan_Newton_7', 9.146484375),
('arctan_Newton_3_parts_level_3', 8.95703125),
('arctan_Newton_6', 8.8984375),
('arctan_Newton_3_parts_level_2', 7.849609375),
('arctan_Newton_5', 7.146484375),
('arctan_Newton_4', 6.89453125),
('arctan_Newton_3_parts_level_1', 6.525390625),
('arctan_Newton_3', 5.12890625)]
In [570]: weighted_mean_digits
Out[570]:
[('arctan_Newton_10_parts_level_4', 26.315827546296298),
('arctan_Newton_10_parts_level_3', 22.52648330896686),
('arctan_Newton_9_parts_level_4', 22.208082054093566),
('arctan_Newton_8_parts_level_4', 22.02068865740741),
('arctan_Newton_9_parts_level_3', 19.19475663986355),
('arctan_Newton_8_parts_level_3', 19.005581444931774),
('arctan_Newton_10_parts_level_2', 18.84232608430799),
('arctan_Newton_7_parts_level_4', 18.04267939814815),
('arctan_Newton_6_parts_level_4', 17.859786184210527),
('arctan_Newton_9_parts_level_2', 16.256304824561404),
('arctan_Newton_8_parts_level_2', 15.976813779239766),
('arctan_Newton_7_parts_level_3', 15.764642726608187),
('arctan_Newton_6_parts_level_3', 15.479806286549708),
('arctan_Newton_10_parts_level_1', 14.021739461500974),
('arctan_Newton_5_parts_level_4', 13.780381944444445),
('arctan_Newton_4_parts_level_4', 13.467790570175438),
('arctan_Newton_7_parts_level_2', 13.217249939083821),
('arctan_Newton_6_parts_level_2', 13.041895102339181),
('arctan_Newton_9_parts_level_1', 12.285788255360623),
('arctan_Newton_8_parts_level_1', 12.069208394249513),
('arctan_Newton_5_parts_level_3', 11.930532711988304),
('arctan_Newton_4_parts_level_3', 11.747609039961013),
('arctan_Newton_10', 10.326228983918128),
('arctan_Newton_7_parts_level_1', 10.248583698830409),
('arctan_Newton_5_parts_level_2', 10.12481725146199),
('arctan_Newton_6_parts_level_1', 10.03785940545809),
('arctan_Newton_4_parts_level_2', 9.86879416423002),
('arctan_Newton_3_parts_level_4', 9.374687804580896),
('arctan_Newton_9', 8.884533382066277),
('arctan_Newton_8', 8.683418615984406),
('arctan_Newton_3_parts_level_3', 8.288697002923977),
('arctan_Newton_5_parts_level_1', 8.201670626218323),
('arctan_Newton_4_parts_level_1', 8.01286854288499),
('arctan_Newton_7', 7.296113547758285),
('arctan_Newton_3_parts_level_2', 7.2269356115984404),
('arctan_Newton_6', 7.005284478557505),
('arctan_Newton_3_parts_level_1', 5.846316094054581),
('arctan_Newton_5', 5.707038864522417),
('arctan_Newton_4', 5.460823282163743),
('arctan_Newton_3', 4.260561342592593)]
In [571]: harmonic_mean_digits
Out[571]:
[('arctan_Newton_10_parts_level_4', 27.912879755249037),
('arctan_Newton_10_parts_level_3', 24.09431747720454),
('arctan_Newton_9_parts_level_4', 23.583977377457146),
('arctan_Newton_8_parts_level_4', 23.408833956121036),
('arctan_Newton_9_parts_level_3', 20.515885834955156),
('arctan_Newton_10_parts_level_2', 20.338294181698156),
('arctan_Newton_8_parts_level_3', 20.337453467612107),
('arctan_Newton_7_parts_level_4', 19.14562028930588),
('arctan_Newton_6_parts_level_4', 18.930002404771447),
('arctan_Newton_9_parts_level_2', 17.4847216228196),
('arctan_Newton_8_parts_level_2', 17.235013212180075),
('arctan_Newton_7_parts_level_3', 16.76722370067298),
('arctan_Newton_6_parts_level_3', 16.498706891294084),
('arctan_Newton_10_parts_level_1', 15.490448318126456),
('arctan_Newton_5_parts_level_4', 14.598071093338412),
('arctan_Newton_4_parts_level_4', 14.309834091107298),
('arctan_Newton_7_parts_level_2', 14.200631282318305),
('arctan_Newton_6_parts_level_2', 14.022124097712098),
('arctan_Newton_9_parts_level_1', 13.525594679107845),
('arctan_Newton_8_parts_level_1', 13.298943101402248),
('arctan_Newton_5_parts_level_3', 12.73590900616713),
('arctan_Newton_4_parts_level_3', 12.52041187737681),
('arctan_Newton_10', 11.497757810835521),
('arctan_Newton_7_parts_level_1', 11.241866690148782),
('arctan_Newton_6_parts_level_1', 11.020276873329582),
('arctan_Newton_5_parts_level_2', 10.901605925457924),
('arctan_Newton_4_parts_level_2', 10.618486665999201),
('arctan_Newton_9', 9.886216476025368),
('arctan_Newton_3_parts_level_4', 9.859903269736689),
('arctan_Newton_8', 9.669425121009134),
('arctan_Newton_5_parts_level_1', 8.925145374305796),
('arctan_Newton_3_parts_level_3', 8.73038506685022),
('arctan_Newton_4_parts_level_1', 8.725059769525904),
('arctan_Newton_7', 8.116640179524895),
('arctan_Newton_6', 7.79453569163586),
('arctan_Newton_3_parts_level_2', 7.622464686259581),
('arctan_Newton_5', 6.3172335781739895),
('arctan_Newton_3_parts_level_1', 6.210840442923198),
('arctan_Newton_4', 6.0507292154913035),
('arctan_Newton_3', 4.650421800394735)]
In [572]: quadratic_mean_digits
Out[572]:
[('arctan_Newton_10_parts_level_4', 28.85937077016753),
('arctan_Newton_10_parts_level_3', 25.21547763775257),
('arctan_Newton_9_parts_level_4', 24.398586344806947),
('arctan_Newton_8_parts_level_4', 24.241340270805985),
('arctan_Newton_10_parts_level_2', 21.639400913033615),
('arctan_Newton_9_parts_level_3', 21.434402108526378),
('arctan_Newton_8_parts_level_3', 21.27640822307656),
('arctan_Newton_7_parts_level_4', 19.82324235588114),
('arctan_Newton_6_parts_level_4', 19.593426034004363),
('arctan_Newton_9_parts_level_2', 18.509288208896635),
('arctan_Newton_8_parts_level_2', 18.308606668040035),
('arctan_Newton_10_parts_level_1', 17.61225159512548),
('arctan_Newton_7_parts_level_3', 17.471684681936082),
('arctan_Newton_6_parts_level_3', 17.23748413704854),
('arctan_Newton_9_parts_level_1', 15.14042343777082),
('arctan_Newton_5_parts_level_4', 15.105294498122174),
('arctan_Newton_7_parts_level_2', 15.062110990827282),
('arctan_Newton_8_parts_level_1', 14.965911787291812),
('arctan_Newton_6_parts_level_2', 14.884516388683913),
('arctan_Newton_4_parts_level_4', 14.845098622946228),
('arctan_Newton_10', 13.959763608313716),
('arctan_Newton_5_parts_level_3', 13.320933361630482),
('arctan_Newton_4_parts_level_3', 13.088341663480518),
('arctan_Newton_7_parts_level_1', 12.467379310624988),
('arctan_Newton_6_parts_level_1', 12.260359140335163),
('arctan_Newton_9', 11.984690755292771),
('arctan_Newton_8', 11.808531079266379),
('arctan_Newton_5_parts_level_2', 11.566891058102),
('arctan_Newton_4_parts_level_2', 11.292195175208406),
('arctan_Newton_3_parts_level_4', 10.228158894688722),
('arctan_Newton_7', 9.855498940946623),
('arctan_Newton_5_parts_level_1', 9.738875865570934),
('arctan_Newton_6', 9.643650760992955),
('arctan_Newton_4_parts_level_1', 9.54921462739214),
('arctan_Newton_3_parts_level_3', 9.116005841375927),
('arctan_Newton_3_parts_level_2', 8.015000584840902),
('arctan_Newton_5', 7.689913239107447),
('arctan_Newton_4', 7.448521245858133),
('arctan_Newton_3_parts_level_1', 6.751301957770812),
('arctan_Newton_3', 5.456504146429287)]
In [573]: %timeit arctan_parts(123, 456, 4, 5)
303 μs ± 11.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [574]: %timeit arctan_parts(123, 456, 10, 5)
1.08 ms ± 9.17 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [575]: %timeit arctan_parts(123, 456, 10, 4)
330 μs ± 13 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [576]: %timeit arctan_Newton(123, 456, 10)
7.08 μs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [577]: %timeit arctan_Newton(123, 456, 4)
3.65 μs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The increase of order (number of points - 1 evaluated in the integration function) and level (log base 2 of number of subintervals) both improve accuracy. In particular, increasing the number of subintervals can make any quadrature accurate, but this also dramatically increase the execution time as expected. Using higher order quadrature rules increase the accuracy modestly while not incurring large costs. And the number of accurate digits gets monotonically smaller as the argument increases for every integration scheme, which I haven't shown.
How can I improve their convergence rate and also reduce execution time simultaneously?
Update
Let me explain my choice of operators. I chose the operators based on efficiency (shortest execution time), not whether or not that looks pretty. The choice is made after empirical testing. If n is some int, n + n is evaluated faster than n * 2, and n * 3 is evaluated faster than n + n + n, and n << 1 is faster than n * 2, and n << 2 is faster than n * 4 which in turn is faster than n + n + n + n.
In [5]: n = 9460536207068016
In [6]: %timeit n + n
48.6 ns ± 0.206 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [7]: %timeit n * 2
68.4 ns ± 0.159 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [8]: %timeit n + n + n
80.4 ns ± 0.23 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit n * 3
69.7 ns ± 0.95 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [10]: %timeit n + n + n + n
111 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [11]: %timeit n * 4
68.2 ns ± 0.0599 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [12]: %timeit n << 2
57.8 ns ± 0.243 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [13]: %timeit n << 1
57.8 ns ± 0.15 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [14]: %timeit n * 8
69.6 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [15]: %timeit n << 3
59.9 ns ± 0.655 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Update 1
I chose the my code structure based purely on maximizing of performance (shortest execution time), and all else are disposable and should be sacrificed for even small gains in making the code faster. All my choices in this program are based on empirical evidence.
In [41]: %%timeit
...: lst = [1] * 1025
...: m = 1
...: for i in range(1, 1025):
...: lst[i] = m = m + m
93.3 μs ± 1.28 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [42]: %%timeit
...: m = 1
...: lst = [1, *((m := m + m) for _ in range(1024))]
129 μs ± 1.25 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [43]: %%timeit
...: m = 1
...: lst = [1, *[(m := m + m) for _ in range(1024)]]
100 μs ± 465 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [44]: %%timeit
...: m = 1
...: lst = [(m := m + m) for _ in range(1024)]
...: lst.insert(0, 1)
94.9 μs ± 975 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [45]: %%timeit
...: m = 1
...: lst = [(m := m * 2) for _ in range(1024)]
...: lst.insert(0, 1)
132 μs ± 1.55 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [46]: %timeit [1 << i for i in range(1025)]
94.5 μs ± 643 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [47]: %%timeit
...: lst = [1] * 1025
...: m = 1
...: for i in range(1, 1025):
...: lst[i] = m = m + m
94 μs ± 919 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [48]: %%timeit
...: m = 1
...: [1] + [(m := m * 2) for _ in range(1024)]
139 μs ± 1.21 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [49]: %%timeit
...: m = 1
...: [1] + [(m := m + m) for _ in range(1024)]
101 μs ± 611 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The above lists eight different ways to generate a list containing the first 1025 powers of two (1 to \$2^{1024}\$). The execution time measures fluctuate, but generally the performance hierarchy doesn't change. The left-shift method and the addition and index assignment method both seem to be the fastest, but for large numbers, repeatedly shifting the big integers will be slow, there is a lot of extra work to be done.
I will explain each in detail.
In the first method, a list of length 1025 is created, initially filled with 1s. Then starting at index 1, each element is mutated by indexing assignment, the code first doubles multiple and assigns the result to multiple and the list at the given index. There is one single list throughout, no list.append calls and therefore no reallocation, it doesn't mutate the length of the list thus the memory access is contiguous, it is just pointer increments, therefore it is the fastest loop scheme.
The second method uses the same logic and operator, except it uses a generator, it doubles multiple at each iteration and reassigns that to multiple and yields the result at the same time, and then unpacks the generator and copies the contents of the yielded items to a list initialized with first item being 1. It has to first construct a generator, a generator is a Python object with all the overheads and it doesn't have a length attribute. The code has to first create a list and exhaust the generator and extend the list via list.extend which copies the values and mutates the length of the list so that memory access isn't contiguous. Therefore it is slow.
The third method is almost identical to the second, except it uses list comprehension instead of a generator. Generally a list comprehension is faster than the equivalent generator (changing [] to () without changing anything else), because it doesn't have Python object overheads. It has the same problem with the second. It has to create two lists and do bulk memory copying.
The fourth method creates a list using list comprehension and inserts 1 at position 0. It creates only one list, but it has to move all 1024 references down by one position to get one free space and then assign the first one. It has to copy all these elements one by one, and it has to move the starting position of the list if there isn't enough memory to store 1025 references because the position at start + 1024 is occupied by some other program. It has to mutate the length of the list therefore memory access isn't contiguous.
The fifth method is almost identical to the fourth, except it uses a multiplication instead of an addition. School book multiplication has time complexity of \$O(n^2)\$, but actually it is false. It is actually \$O(\lfloor \log_{base} n \rfloor + 1)^2\$, where base is the maximum digit plus one, \$\lfloor \log_{base} n \rfloor + 1\$ is the number of digits of n in base base numeral expansion, multiplication has to multiply every digit of the left operand with every digit of the right operand in the numeral expansions. For CPython the base is \$2^{30} = 1073741824\$, which fits inside a uint32_t. For large integers Python uses Karatsuba multiplication which has lower time complexity and also higher cost for small inputs, it multiplies two n-digit numbers in \$O(n^{\log_2 3})\$ time, which triggers in the example. For even larger numbers Python uses Number Theoretic Transform which has time complexity \$O(n \log n)\$ for n-digit multiplication but here the bit length is below the cutoff so it doesn't trigger. NTT is extremely slow for small inputs.
Now what is the time complexity of addition? Adding two n-digit integers can be done in n iterations, so it is \$O(n)\$ where n is the number of digits or \$\lfloor \log_{base} N \rfloor + 1\$ for input N. So for adding one integer with itself, n + n is strictly faster than n * 2, because n < n log n. Of course for adding n with itself m times n * (m + 1) for m > 1 is strictly faster than chain addition, because they have the same time complexity and the former has far fewer function calls and overheads.
The sixth method is almost as fast as the first, but the timings are unreliable. It uses list comprehension to generate a single list containing 1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5... 1<<1024. For an n-digit number left-shift is \$O(n)\$ in time, it has to propagate the carry bit from lowest digit to highest digit (+ 1 digit if highest bit is set) in one single loop, so it is strictly faster than multiplication for multiplying with powers of 2. For some reason n << 1 is slower than n + n. The sixth method does a whole lot of redundant work because each immediate result can be obtained by a single operation from the previous result. So it is strictly slower than the first.
The eighth and nineth generates a list using list comprehension and concatenates it with another list. It creates two lists and then creates another list and copies the contents of the two lists to the third list, so they are the slowest.