6
\$\begingroup\$

I have written a Numerical integration library in Python from scratch using Newton-Cotes formulas and integer arithmetic, as a self-imposed programming challenge. I wrote all the code completely by myself.

I implemented all the code using formulas found here and there.

Now what is a Newton-Cotes formula? It is something like this:

$$\int_a^b f(x)dx \approx \frac{A}{B}\frac{b - a}{n}(W_0f(a) + W_1f(a+\frac{b-a}{n}) + W_2f(a+\frac{2(b-a)}{n})+W_3f(a+\frac{3(b-a)}{n}))+W_4f(a+\frac{4(b-a)}{n})+...+W_4f(a+\frac{(n-4)(b-a)}{n})+W_3f(a+\frac{(n-3)(b-a)}{n})+W_2f(a+\frac{(n-2)(b-a)}{n})+W_1f(a+\frac{(n-1)(b-a)}{n})+W_0f(b))$$

The notation seems complicated, but actually it is quite simple. b-a is the difference between them, obviously, and it is also the length of the interval. A and B are just a pair integers, n is some integer. Here we split the interval [a, b] into n evenly spaced subintervals, \$\frac{b-a}{n}\$ is the length of the subintervals. With n intervals there are n + 1 boundary points. \$W_i\$ is just some integer, the formula just tells us to evaluate the function at these boundary points, and multiply the function value with the corresponding weight, add them together, and then multiply the sum with the two fractions at the front. And the result is approximately the integral.

That is all there is to it, really. Importantly the weights are symmetric around the middle (\$\frac{a + b}{2}\$), each weight will apply to exactly two positions except the middle position, the middle position is a boundary point if n is even and no other point shares its weight.

I represent all these numbers as fractions, stored as a pair of integers. How do I add fractions? It is just basic algebra, the universal formula that works for all cases without needing conditionals is \$\frac{a}{b} + \frac{c}{d} = \frac{ad + bc}{bd}\$, and that is what I use.

And the difference is \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{bd}\$, what is 0.5 of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{2bd}\$. What is twice of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{2(bc - ad)}{bd}\$. What is thrice of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{3(bc - ad)}{bd}\$ What is \$0.\overline{3}\$ of the difference? \$\frac{c}{d} - \frac{a}{b} = \frac{bc - ad}{3bd}\$...

You get the idea. To multiply a fraction by an integer just multiply the numerator with the integer, to divide a fraction by an integer just multiply the denominator with the integer. Notice that the numbers (bc - ad) and bd don't change, so we don't need to recalculate them, we store them as variables and reuse them to calculate the multiples.

Represent \$a\$ as \$\frac{a_n}{a_d}\$ and \$b\$ as \$\frac{b_n}{b_d}\$, then an interior point is \$\frac{ya_nb_d + x(b_na_d - a_nb_d)}{ya_db_d}\$, this interior point is exactly \$\frac{x}{y}\$ the way there from a to b, in other words it is \$a + \frac{x(b - a)}{y}\$.

Now because I don't use GCD (which will slow down the code) and I just naively use cross multiplication, the numbers will explode quickly which will also slow down the code, I must make the interior positions reduced fractions. Numerators and denominators of the reduced fractions repeat, so I store precomputed values in lists and use indexing to combine with with the corresponding weight.

I have done a number of optimizations. because multiplication is slower compared to addition and some multiples are sums of previous multiples, I have made it so the code to combine two of such previous multiples by addition to get the current multiple (but not above, one multiplication beats addition and indexing chain), and I have made the code to detect arithmetic progression automatically, and I have made the code to use x << n.bit_length() - 1 instead of x * n if n is a power of two (n & -n == n), except for n==1 case I use x + x because that beats left shift by one.

Oh, did I mention that I wrote functions that generates the integration functions using some other functions and parts of functions stored in string templates and the str.format function...? Yes indeed, I wrote code that programmatically generates Python functions using formulas stored as dicts and I use exec(compile(source, "<dynamic>", "exec"), NAMESPACE) to generate the Python functions.

And I have implemented integration by parts. It is simple, split the interval into n subintervals of equal length each, integrate each and then sum. I chose the powers of two as the count of subintervals. I use the term level to denote the count of subintervals, the count of subintervals is 1 << level, for example if level is 5 there are \$2^5=32\$ subintervals.

Now what are the dyadic fractions?

$$\left\{ \begin{aligned} \frac{0}{1}, \ \frac{1}{64}, \ \frac{1}{32}, \ \frac{3}{64}, \ \frac{1}{16}, \ \frac{5}{64}, \ \frac{3}{32}, \ \frac{7}{64}, \ \frac{1}{8}, \ \frac{9}{64}, \ \frac{5}{32}, \ \frac{11}{64}, \ \frac{3}{16} \\ \frac{13}{64}, \ \frac{7}{32}, \ \frac{15}{64}, \ \frac{1}{4}, \ \frac{17}{64}, \ \frac{9}{32}, \ \frac{19}{64}, \ \frac{5}{16}, \ \frac{21}{64}, \ \frac{11}{32}, \ \frac{23}{64}, \ \frac{3}{8}, \ \frac{25}{64} \\ \frac{13}{32}, \ \frac{27}{64}, \ \frac{7}{16}, \ \frac{29}{64}, \ \frac{15}{32}, \ \frac{31}{64}, \ \frac{1}{2}, \ \frac{33}{64}, \ \frac{17}{32}, \ \frac{35}{64}, \ \frac{9}{16}, \ \frac{37}{64}, \ \frac{19}{32} \\ \frac{39}{64}, \ \frac{5}{8}, \ \frac{41}{64}, \ \frac{21}{32}, \ \frac{43}{64}, \ \frac{11}{16}, \ \frac{45}{64}, \ \frac{23}{32}, \ \frac{47}{64}, \ \frac{3}{4}, \ \frac{49}{64}, \ \frac{25}{32}, \ \frac{51}{64} \\ \frac{13}{16}, \ \frac{53}{64}, \ \frac{27}{32}, \ \frac{55}{64}, \ \frac{7}{8}, \ \frac{57}{64}, \ \frac{29}{32}, \ \frac{59}{64}, \ \frac{15}{16}, \ \frac{61}{64}, \ \frac{31}{32}, \ \frac{63}{64}, \ \frac{1}{1} \\ \end{aligned} \right\} $$

These are all the reduced fractions between 0 and 1 with \$2^{-6}\$ spacing, notice that all the numerators are odd except 0. Because the odds are all coprime to two and thus powers of two. For 2 there are only two possible modulus remainders: 0 and 1. If the remainder is 0 then the number is a multiple of 2, if the remainder is 1 then it is odd (2-coprime). Thus odds are 2x+1 or 2x-1. Notice 64 appears in the denominator every other term, and 32 appears in the denominator once every four times, and 16 appears in the denominator once every eight times and so on, these are exactly the powers of two, their first appearances are also at powers of two indices, and the numerators are of course the odd numbers below the denominator for each denominator. Each halving of the denominator corresponds to doubling of the step size, and every denominator first appears at index equal to half the step size.

And here is my code:

from itertools import islice
from math import ceil, gcd
from gmpy2 import atan2, get_context, mpfr, log10

NEWTON_COTES_FORMULAS = [
    {
        "steps": 3,
        "scale": (3, 8),
        "extrema": 1,
        "middle": {3: [(1, 3), (2, 3)]},
    },
    {
        "steps": 4,
        "scale": (2, 45),
        "extrema": 7,
        "middle": {32: [(1, 4), (3, 4)], 12: [(1, 2)]},
    },
    {
        "steps": 5,
        "scale": (5, 288),
        "extrema": 19,
        "middle": {75: [(1, 5), (4, 5)], 50: [(2, 5), (3, 5)]},
    },
    {
        "steps": 6,
        "scale": (1, 140),
        "extrema": 41,
        "middle": {216: [(1, 6), (5, 6)], 27: [(1, 3), (2, 3)], 272: [(1, 2)]},
    },
    {
        "steps": 7,
        "scale": (7, 17280),
        "extrema": 751,
        "middle": {
            3577: [(1, 7), (6, 7)],
            1323: [(2, 7), (5, 7)],
            2989: [(3, 7), (4, 7)],
        },
    },
    {
        "steps": 8,
        "scale": (4, 14175),
        "extrema": 989,
        "middle": {
            5888: [(1, 8), (7, 8)],
            -928: [(1, 4), (3, 4)],
            10496: [(3, 8), (5, 8)],
            -4540: [(1, 2)],
        },
    },
    {
        "steps": 9,
        "scale": (9, 89600),
        "extrema": 2857,
        "middle": {
            15741: [(1, 9), (8, 9)],
            1080: [(2, 9), (7, 9)],
            19344: [(1, 3), (2, 3)],
            5778: [(4, 9), (5, 9)],
        },
    },
    {
        "steps": 10,
        "scale": (5, 299376),
        "extrema": 16067,
        "middle": {
            106300: [(1, 10), (9, 10)],
            -48525: [(1, 5), (4, 5)],
            272400: [(3, 10), (7, 10)],
            -260550: [(2, 5), (3, 5)],
            427368: [(1, 2)],
        },
    },
]


def generate_indices(formula: dict) -> dict:
    num_mults = set()
    den_mults = set()
    middle = formula["middle"]
    for points in middle.values():
        num, den = points[0]
        num_mults.add(num)
        den_mults.add(den)
        if len(points) > 1:
            num, den = points[1]
            num_mults.add(num)
            den_mults.add(den)

    num_mults = sorted(num_mults)
    den_mults = sorted(den_mults)
    num_indices = {e: i for i, e in enumerate(num_mults)}
    if len(den_mults) == 1 and den_mults[0] > 2:
        return {
            "num_mults": num_mults,
            "den_mults": den_mults,
            "indices": [
                (weight, num_indices[a], num_indices[b])
                for weight, ((a, _), (b, _)) in middle.items()
            ],
            "middle": None,
        }

    den_indices = {e: i for i, e in enumerate(den_mults)}
    indices = [
        (weight, [(num_indices[num], den_indices[den]) for num, den in points])
        for weight, points in middle.items()
    ]
    weight = None
    if den_mults[0] == 2:
        weight, _ = indices.pop(-1)

    return {
        "num_mults": num_mults,
        "den_mults": den_mults,
        "indices": indices,
        "middle": weight,
    }


def assemble_strategy(mults: list[int]) -> dict:
    strategy = {"add": [], "mul": []}
    if (e := mults[0]) == 1:
        diffs = [b - a for a, b in zip(mults, mults[1:])]
        if len(set(diffs)) == 1:
            return {"type": "range", "step": diffs[0], "increments": len(diffs)}
    else:
        strategy["mul"].append((0, e))

    l = len(mults)
    for i, e in enumerate(mults[1:], start=1):
        found = False
        for j, f in enumerate(mults[:i]):
            for k, g in enumerate(mults[j:i], start=j):
                if f + g == e:
                    strategy["add"].append((i, j, k))
                    found = True
                    break
            if found:
                break
        else:
            strategy["mul"].append((i, e))

    return {"type": "general", "length": len(mults), "strategy": strategy}


FUNCTION_TEMPLATE = """def {func_name}(
    low_num: int, low_den: int, high_num: int, high_den: int, integrand: callable
) -> tuple[int, int]:
    term_1 = low_num * high_den
    term_2 = high_num * low_den - term_1
    term_3 = low_den * high_den
{assemble}
    result_num, result_den = integrand(low_num, low_den)
    term_num, term_den = integrand(high_num, high_den)
    result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
    result_num {extrema_weight}
    indices = {indices_dict}["{func_name}"]{loop_body}{extra_iteration}
    return result_num * term_2 {num_scale}, result_den * {last_den} {den_scale}
"""

FUNCTION_MAINLOOP = """
    for weight, ((x1, y), (x2, _)) in indices:
        term_num, term_den = integrand((start := mults_1[y]) + mults_2[x1], (den := mults_3[y]))
        next_num, next_den = integrand(start + mults_2[x2], den)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        term_num *= weight
        result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""

FUNCTION_ALTLOOP = """
    for weight, x1, x2 in indices:
        term_num, term_den = integrand(term_4 + mults_2[x1], term_5)
        next_num, next_den = integrand(term_4 + mults_2[x2], term_5)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        term_num *= weight
        result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""


FUNCTION_LAST_ITERATION = """
    term_num, term_den = integrand(mults_1[0] + term_2, mults_3[0])
    term_num {weight}
    result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
"""


def mul_or_lshift(n: int) -> str:
    return f"* {n}" if n & -n != n else f"<< {n.bit_length() - 1}"


def imul_or_ilshift(n: int) -> str:
    return f"*= {n}" if n & -n != n else f"<<= {n.bit_length() - 1}"


def assemble_numerator_multiples(strategy: dict) -> list:
    if strategy["type"] == "range":
        return [
            (
                "    step = term_2"
                if (step := strategy["step"]) == 1
                else (
                    "    step = term_2 + term_2"
                    if step == 2
                    else f"    step = term_2 {mul_or_lshift(step)}"
                )
            ),
            "    number = term_2",
            f"    mults_2 = [term_2] * {strategy['increments']+1}\n    for i in range(1, {strategy['increments']+1}):\n        mults_2[i] = number = number + step\n",
        ]

    result = [f"    mults_2 = [term_2] * {strategy['length']}"]
    strats = strategy["strategy"]
    for a, b in strats["mul"]:
        result.append(f"    mults_2[{a}] {imul_or_ilshift(b)}")

    for i, j, k in strats["add"]:
        if j != k:
            result.append(
                f"    mults_2[{i}] = "
                + (f"mults_2[{j}]" if j else "term_2")
                + " + "
                + (f"mults_2[{k}]" if k else "term_2")
            )
        else:
            if not j:
                result.append(f"    mults_2[{i}] += term_2")

            else:
                result.append(f"    mults_2[{i}] = (val := mults_2[{j}]) + val")

    return result


def assemble_denominator_multiples(strategy: dict) -> list:
    strats = strategy["strategy"]
    if strategy["length"] == 1:
        mult = strats["mul"][0][1]
        return [
            f"    term_4 = term_1 {(op := mul_or_lshift(mult))}\n    term_5 = term_3 {op}"
        ]

    result = [
        f"    mults_1 = [term_1] * {(length := strategy['length'])}",
        f"    mults_3 = [term_3] * {length}",
    ]

    for a, b in strats["mul"]:
        result.append(
            f"    mults_1[{a}] {(op := imul_or_ilshift(b))}\n    mults_3[{a}] {op}"
        )

    for i, j, k in strats["add"]:
        if j != k:
            result.extend(
                (
                    f"    mults_1[{i}] = mults_1[{j}] + mults_1[{k}]",
                    f"    mults_3[{i}] = mults_3[{j}] + mults_3[{k}]",
                )
            )
        else:
            result.extend(
                (
                    f"    mults_1[{i}] = (val := mults_1[{j}]) + val",
                    f"    mults_3[{i}] = (val := mults_3[{j}]) + val",
                )
            )

    return result


NAMESPACE = {"NEWTON_COTES_MIDDLE_INDICES": {}, "ROMBERG_WEIGHTS": {}}


def generate_source_Newton(level: int) -> str:
    assert isinstance(level, int) and 0 <= level <= 7
    fname = f"Newton_Cotes_integrate_{level+3}"
    formula = NEWTON_COTES_FORMULAS[level]
    data = generate_indices(formula)
    NAMESPACE["NEWTON_COTES_MIDDLE_INDICES"][fname] = data["indices"]
    den_mults = data["den_mults"]
    num_scale, den_scale = formula["scale"]
    extrema = formula["extrema"]
    l = len(den_mults)
    return FUNCTION_TEMPLATE.format(
        func_name=fname,
        assemble="\n".join(
            assemble_numerator_multiples(assemble_strategy(data["num_mults"]))
            + assemble_denominator_multiples(assemble_strategy(den_mults))
        ),
        extrema_weight=imul_or_ilshift(extrema),
        indices_dict="NEWTON_COTES_MIDDLE_INDICES",
        loop_body=FUNCTION_MAINLOOP if l > 1 else FUNCTION_ALTLOOP,
        extra_iteration=(
            FUNCTION_LAST_ITERATION.format(weight=imul_or_ilshift(middle))
            if (middle := data["middle"])
            else ""
        ),
        num_scale=mul_or_lshift(num_scale),
        den_scale=mul_or_lshift(den_scale),
        last_den="term_5" if l == 1 else "mults_3[-1]",
    )


FUNCTION_SOURCES = {}
for i in range(8):
    FUNCTION_SOURCES[f"Newton_Cotes_integrate_{i+3}"] = source = generate_source_Newton(
        i
    )
    exec(compile(source, "<dynamic>", "exec"), NAMESPACE)


def integrate_by_parts(
    low_num: int,
    low_den: int,
    high_num: int,
    high_den: int,
    level: int,
    integrand: callable,
    integrator: callable,
    reduce: bool = False,
) -> tuple[int, int]:
    start = low_num * high_den
    step_num, den = high_num * low_den - start, low_den * high_den
    total = 1 << level
    points = [(low_num, low_den)] * (total + 1)
    points[-1] = (high_num, high_den)
    level_up = level + 1
    multiple = start
    starts = [start] * level_up
    for i in range(1, level_up):
        starts[i] = multiple = multiple + multiple

    multiple = den
    denominators = [den] * level_up
    for i in range(1, level_up):
        denominators[i] = multiple = multiple + multiple

    step = step_num + step_num
    multiple = step_num
    lim = total >> 1
    odds = [step_num] * lim
    for i in range(1, lim):
        odds[i] = multiple = multiple + step

    power = 1
    powers = [1] * level_up
    for i in range(1, level_up):
        powers[i] = power = power + power

    for index, start, step in zip(range(level, 0, -1), powers, powers[1:]):
        initial = starts[index]
        denominator = denominators[index]
        for i, odd in zip(range(start, total, step), odds):
            points[i] = (initial + odd, denominator)

        del odds[lim:]
        lim >>= 1

    result_num, result_den = 0, 1
    for (num1, den1), (num2, den2) in zip(points, points[1:]):
        term_num, term_den = integrator(num1, den1, num2, den2, integrand)
        result_num, result_den = (
            result_num * term_den + term_num * result_den,
            result_den * term_den,
        )

    if reduce:
        common = gcd(result_num, result_den)
        result_num //= common
        result_den //= common

    return result_num, result_den

def arctan_derivative(x: int, y: int) -> tuple[int, int]:
    return y * y, x * x + y * y


def arctan_Newton(x: int, y: int, order: int) -> tuple[int, int]:
    assert 3 <= order <= 10
    return NAMESPACE[f"Newton_Cotes_integrate_{order}"](0, 1, x, y, arctan_derivative)


def arctan_parts(x: int, y: int, order: int, level: int) -> tuple[int, int]:
    assert 3 <= order <= 10
    return integrate_by_parts(0, 1, x, y, level, arctan_derivative, NAMESPACE[f"Newton_Cotes_integrate_{order}"])


def get_accuracy_fraction_levels(
    steps: int, max_level: int, func: callable, digits: int = 100
):
    get_context().precision = int(ceil(digits / log10(2)))
    levels = [[] for _ in range(max_level)]
    max_prec = mpfr(0.1) ** digits
    for i in range(1, steps + 1):
        atan = atan2(i, steps)
        for j, k in zip(range(max_level), range(1, max_level + 1)):
            num, den = func(i, steps, k)
            levels[j].append(
                (i / steps, int(-log10(abs(mpfr(num) / den - atan) or max_prec)))
            )

    return levels


def get_accuracy_fraction(steps: int, func: callable, digits: int = 100):
    get_context().precision = int(ceil(digits / log10(2)))
    result = []
    max_prec = mpfr(0.1) ** digits
    for i in range(1, steps + 1):
        num, den = func(i, steps)
        result.append(
            (
                i / steps,
                int(
                    -log10(
                        abs(mpfr(num) / den - atan2(i, steps)) or max_prec
                    )
                ),
            )
        )

    return result


TESTS = [
    ("arctan_Newton_3", lambda x, y: arctan_Newton(x, y, 3), get_accuracy_fraction, 0),
    ("arctan_Newton_4", lambda x, y: arctan_Newton(x, y, 4), get_accuracy_fraction, 0),
    ("arctan_Newton_5", lambda x, y: arctan_Newton(x, y, 5), get_accuracy_fraction, 0),
    ("arctan_Newton_6", lambda x, y: arctan_Newton(x, y, 6), get_accuracy_fraction, 0),
    ("arctan_Newton_7", lambda x, y: arctan_Newton(x, y, 7), get_accuracy_fraction, 0),
    ("arctan_Newton_8", lambda x, y: arctan_Newton(x, y, 8), get_accuracy_fraction, 0),
    ("arctan_Newton_9", lambda x, y: arctan_Newton(x, y, 9), get_accuracy_fraction, 0),
    ("arctan_Newton_10", lambda x, y: arctan_Newton(x, y, 10), get_accuracy_fraction, 0),
    ("arctan_Newton_3_parts", lambda x, y, level: arctan_parts(x, y, 3, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_4_parts", lambda x, y, level: arctan_parts(x, y, 4, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_5_parts", lambda x, y, level: arctan_parts(x, y, 5, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_6_parts", lambda x, y, level: arctan_parts(x, y, 6, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_7_parts", lambda x, y, level: arctan_parts(x, y, 7, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_8_parts", lambda x, y, level: arctan_parts(x, y, 8, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_9_parts", lambda x, y, level: arctan_parts(x, y, 9, level), get_accuracy_fraction_levels, 1),
    ("arctan_Newton_10_parts", lambda x, y, level: arctan_parts(x, y, 10, level), get_accuracy_fraction_levels, 1),
]

def get_data(steps: int, max_level: int, digits: int = 100) -> dict:
    result = {}
    for func_name, func, test_func, nested in TESTS:
        if nested:
            for i, line in enumerate(
                test_func(steps, max_level, func, digits), start=1
            ):
                result[f"{func_name}_level_{i}"] = line
        else:
            result[func_name] = test_func(steps, func, digits)

    return result


data = get_data(512, 4)
mean_digits = []
weighted_mean_digits = []
harmonic_mean_digits = []
quadratic_mean_digits = []
for k, v in data.items():
    s1 = s2 = s3 = s4 = 0
    for a, b in v:
        s1 += b
        s2 += float(a * b)
        s3 += 1 / b
        s4 += b * b

    mean_digits.append((k, s1 / 512))
    weighted_mean_digits.append((k, s2 / 256.5))
    harmonic_mean_digits.append((k, 512 / s3))
    quadratic_mean_digits.append((k, (s4 / 512) ** 0.5))


mean_digits.sort(key=lambda x: -x[1])
weighted_mean_digits.sort(key=lambda x: -x[1])
harmonic_mean_digits.sort(key=lambda x: -x[1])
quadratic_mean_digits.sort(key=lambda x: -x[1])

Test results:

In [569]: mean_digits
Out[569]:
[('arctan_Newton_10_parts_level_4', 28.48828125),
 ('arctan_Newton_10_parts_level_3', 24.76953125),
 ('arctan_Newton_9_parts_level_4', 24.080078125),
 ('arctan_Newton_8_parts_level_4', 23.916015625),
 ('arctan_Newton_10_parts_level_2', 21.115234375),
 ('arctan_Newton_9_parts_level_3', 21.0703125),
 ('arctan_Newton_8_parts_level_3', 20.904296875),
 ('arctan_Newton_7_parts_level_4', 19.55859375),
 ('arctan_Newton_6_parts_level_4', 19.33203125),
 ('arctan_Newton_9_parts_level_2', 18.09375),
 ('arctan_Newton_8_parts_level_2', 17.876953125),
 ('arctan_Newton_7_parts_level_3', 17.189453125),
 ('arctan_Newton_6_parts_level_3', 16.943359375),
 ('arctan_Newton_10_parts_level_1', 16.78125),
 ('arctan_Newton_5_parts_level_4', 14.904296875),
 ('arctan_Newton_7_parts_level_2', 14.71484375),
 ('arctan_Newton_4_parts_level_4', 14.634765625),
 ('arctan_Newton_6_parts_level_2', 14.537109375),
 ('arctan_Newton_9_parts_level_1', 14.501953125),
 ('arctan_Newton_8_parts_level_1', 14.310546875),
 ('arctan_Newton_5_parts_level_3', 13.087890625),
 ('arctan_Newton_10', 12.9296875),
 ('arctan_Newton_4_parts_level_3', 12.859375),
 ('arctan_Newton_7_parts_level_1', 11.978515625),
 ('arctan_Newton_6_parts_level_1', 11.765625),
 ('arctan_Newton_5_parts_level_2', 11.30078125),
 ('arctan_Newton_9', 11.1171875),
 ('arctan_Newton_4_parts_level_2', 11.017578125),
 ('arctan_Newton_8', 10.921875),
 ('arctan_Newton_3_parts_level_4', 10.080078125),
 ('arctan_Newton_5_parts_level_1', 9.408203125),
 ('arctan_Newton_4_parts_level_1', 9.21484375),
 ('arctan_Newton_7', 9.146484375),
 ('arctan_Newton_3_parts_level_3', 8.95703125),
 ('arctan_Newton_6', 8.8984375),
 ('arctan_Newton_3_parts_level_2', 7.849609375),
 ('arctan_Newton_5', 7.146484375),
 ('arctan_Newton_4', 6.89453125),
 ('arctan_Newton_3_parts_level_1', 6.525390625),
 ('arctan_Newton_3', 5.12890625)]

In [570]: weighted_mean_digits
Out[570]:
[('arctan_Newton_10_parts_level_4', 26.315827546296298),
 ('arctan_Newton_10_parts_level_3', 22.52648330896686),
 ('arctan_Newton_9_parts_level_4', 22.208082054093566),
 ('arctan_Newton_8_parts_level_4', 22.02068865740741),
 ('arctan_Newton_9_parts_level_3', 19.19475663986355),
 ('arctan_Newton_8_parts_level_3', 19.005581444931774),
 ('arctan_Newton_10_parts_level_2', 18.84232608430799),
 ('arctan_Newton_7_parts_level_4', 18.04267939814815),
 ('arctan_Newton_6_parts_level_4', 17.859786184210527),
 ('arctan_Newton_9_parts_level_2', 16.256304824561404),
 ('arctan_Newton_8_parts_level_2', 15.976813779239766),
 ('arctan_Newton_7_parts_level_3', 15.764642726608187),
 ('arctan_Newton_6_parts_level_3', 15.479806286549708),
 ('arctan_Newton_10_parts_level_1', 14.021739461500974),
 ('arctan_Newton_5_parts_level_4', 13.780381944444445),
 ('arctan_Newton_4_parts_level_4', 13.467790570175438),
 ('arctan_Newton_7_parts_level_2', 13.217249939083821),
 ('arctan_Newton_6_parts_level_2', 13.041895102339181),
 ('arctan_Newton_9_parts_level_1', 12.285788255360623),
 ('arctan_Newton_8_parts_level_1', 12.069208394249513),
 ('arctan_Newton_5_parts_level_3', 11.930532711988304),
 ('arctan_Newton_4_parts_level_3', 11.747609039961013),
 ('arctan_Newton_10', 10.326228983918128),
 ('arctan_Newton_7_parts_level_1', 10.248583698830409),
 ('arctan_Newton_5_parts_level_2', 10.12481725146199),
 ('arctan_Newton_6_parts_level_1', 10.03785940545809),
 ('arctan_Newton_4_parts_level_2', 9.86879416423002),
 ('arctan_Newton_3_parts_level_4', 9.374687804580896),
 ('arctan_Newton_9', 8.884533382066277),
 ('arctan_Newton_8', 8.683418615984406),
 ('arctan_Newton_3_parts_level_3', 8.288697002923977),
 ('arctan_Newton_5_parts_level_1', 8.201670626218323),
 ('arctan_Newton_4_parts_level_1', 8.01286854288499),
 ('arctan_Newton_7', 7.296113547758285),
 ('arctan_Newton_3_parts_level_2', 7.2269356115984404),
 ('arctan_Newton_6', 7.005284478557505),
 ('arctan_Newton_3_parts_level_1', 5.846316094054581),
 ('arctan_Newton_5', 5.707038864522417),
 ('arctan_Newton_4', 5.460823282163743),
 ('arctan_Newton_3', 4.260561342592593)]

In [571]: harmonic_mean_digits
Out[571]:
[('arctan_Newton_10_parts_level_4', 27.912879755249037),
 ('arctan_Newton_10_parts_level_3', 24.09431747720454),
 ('arctan_Newton_9_parts_level_4', 23.583977377457146),
 ('arctan_Newton_8_parts_level_4', 23.408833956121036),
 ('arctan_Newton_9_parts_level_3', 20.515885834955156),
 ('arctan_Newton_10_parts_level_2', 20.338294181698156),
 ('arctan_Newton_8_parts_level_3', 20.337453467612107),
 ('arctan_Newton_7_parts_level_4', 19.14562028930588),
 ('arctan_Newton_6_parts_level_4', 18.930002404771447),
 ('arctan_Newton_9_parts_level_2', 17.4847216228196),
 ('arctan_Newton_8_parts_level_2', 17.235013212180075),
 ('arctan_Newton_7_parts_level_3', 16.76722370067298),
 ('arctan_Newton_6_parts_level_3', 16.498706891294084),
 ('arctan_Newton_10_parts_level_1', 15.490448318126456),
 ('arctan_Newton_5_parts_level_4', 14.598071093338412),
 ('arctan_Newton_4_parts_level_4', 14.309834091107298),
 ('arctan_Newton_7_parts_level_2', 14.200631282318305),
 ('arctan_Newton_6_parts_level_2', 14.022124097712098),
 ('arctan_Newton_9_parts_level_1', 13.525594679107845),
 ('arctan_Newton_8_parts_level_1', 13.298943101402248),
 ('arctan_Newton_5_parts_level_3', 12.73590900616713),
 ('arctan_Newton_4_parts_level_3', 12.52041187737681),
 ('arctan_Newton_10', 11.497757810835521),
 ('arctan_Newton_7_parts_level_1', 11.241866690148782),
 ('arctan_Newton_6_parts_level_1', 11.020276873329582),
 ('arctan_Newton_5_parts_level_2', 10.901605925457924),
 ('arctan_Newton_4_parts_level_2', 10.618486665999201),
 ('arctan_Newton_9', 9.886216476025368),
 ('arctan_Newton_3_parts_level_4', 9.859903269736689),
 ('arctan_Newton_8', 9.669425121009134),
 ('arctan_Newton_5_parts_level_1', 8.925145374305796),
 ('arctan_Newton_3_parts_level_3', 8.73038506685022),
 ('arctan_Newton_4_parts_level_1', 8.725059769525904),
 ('arctan_Newton_7', 8.116640179524895),
 ('arctan_Newton_6', 7.79453569163586),
 ('arctan_Newton_3_parts_level_2', 7.622464686259581),
 ('arctan_Newton_5', 6.3172335781739895),
 ('arctan_Newton_3_parts_level_1', 6.210840442923198),
 ('arctan_Newton_4', 6.0507292154913035),
 ('arctan_Newton_3', 4.650421800394735)]

In [572]: quadratic_mean_digits
Out[572]:
[('arctan_Newton_10_parts_level_4', 28.85937077016753),
 ('arctan_Newton_10_parts_level_3', 25.21547763775257),
 ('arctan_Newton_9_parts_level_4', 24.398586344806947),
 ('arctan_Newton_8_parts_level_4', 24.241340270805985),
 ('arctan_Newton_10_parts_level_2', 21.639400913033615),
 ('arctan_Newton_9_parts_level_3', 21.434402108526378),
 ('arctan_Newton_8_parts_level_3', 21.27640822307656),
 ('arctan_Newton_7_parts_level_4', 19.82324235588114),
 ('arctan_Newton_6_parts_level_4', 19.593426034004363),
 ('arctan_Newton_9_parts_level_2', 18.509288208896635),
 ('arctan_Newton_8_parts_level_2', 18.308606668040035),
 ('arctan_Newton_10_parts_level_1', 17.61225159512548),
 ('arctan_Newton_7_parts_level_3', 17.471684681936082),
 ('arctan_Newton_6_parts_level_3', 17.23748413704854),
 ('arctan_Newton_9_parts_level_1', 15.14042343777082),
 ('arctan_Newton_5_parts_level_4', 15.105294498122174),
 ('arctan_Newton_7_parts_level_2', 15.062110990827282),
 ('arctan_Newton_8_parts_level_1', 14.965911787291812),
 ('arctan_Newton_6_parts_level_2', 14.884516388683913),
 ('arctan_Newton_4_parts_level_4', 14.845098622946228),
 ('arctan_Newton_10', 13.959763608313716),
 ('arctan_Newton_5_parts_level_3', 13.320933361630482),
 ('arctan_Newton_4_parts_level_3', 13.088341663480518),
 ('arctan_Newton_7_parts_level_1', 12.467379310624988),
 ('arctan_Newton_6_parts_level_1', 12.260359140335163),
 ('arctan_Newton_9', 11.984690755292771),
 ('arctan_Newton_8', 11.808531079266379),
 ('arctan_Newton_5_parts_level_2', 11.566891058102),
 ('arctan_Newton_4_parts_level_2', 11.292195175208406),
 ('arctan_Newton_3_parts_level_4', 10.228158894688722),
 ('arctan_Newton_7', 9.855498940946623),
 ('arctan_Newton_5_parts_level_1', 9.738875865570934),
 ('arctan_Newton_6', 9.643650760992955),
 ('arctan_Newton_4_parts_level_1', 9.54921462739214),
 ('arctan_Newton_3_parts_level_3', 9.116005841375927),
 ('arctan_Newton_3_parts_level_2', 8.015000584840902),
 ('arctan_Newton_5', 7.689913239107447),
 ('arctan_Newton_4', 7.448521245858133),
 ('arctan_Newton_3_parts_level_1', 6.751301957770812),
 ('arctan_Newton_3', 5.456504146429287)]

In [573]: %timeit arctan_parts(123, 456, 4, 5)
303 μs ± 11.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [574]: %timeit arctan_parts(123, 456, 10, 5)
1.08 ms ± 9.17 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [575]: %timeit arctan_parts(123, 456, 10, 4)
330 μs ± 13 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [576]: %timeit arctan_Newton(123, 456, 10)
7.08 μs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [577]: %timeit arctan_Newton(123, 456, 4)
3.65 μs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The increase of order (number of points - 1 evaluated in the integration function) and level (log base 2 of number of subintervals) both improve accuracy. In particular, increasing the number of subintervals can make any quadrature accurate, but this also dramatically increase the execution time as expected. Using higher order quadrature rules increase the accuracy modestly while not incurring large costs. And the number of accurate digits gets monotonically smaller as the argument increases for every integration scheme, which I haven't shown.

How can I improve their convergence rate and also reduce execution time simultaneously?


Update

Let me explain my choice of operators. I chose the operators based on efficiency (shortest execution time), not whether or not that looks pretty. The choice is made after empirical testing. If n is some int, n + n is evaluated faster than n * 2, and n * 3 is evaluated faster than n + n + n, and n << 1 is faster than n * 2, and n << 2 is faster than n * 4 which in turn is faster than n + n + n + n.

In [5]: n = 9460536207068016

In [6]: %timeit n + n
48.6 ns ± 0.206 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [7]: %timeit n * 2
68.4 ns ± 0.159 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [8]: %timeit n + n + n
80.4 ns ± 0.23 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit n * 3
69.7 ns ± 0.95 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [10]: %timeit n + n + n + n
111 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [11]: %timeit n * 4
68.2 ns ± 0.0599 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [12]: %timeit n << 2
57.8 ns ± 0.243 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [13]: %timeit n << 1
57.8 ns ± 0.15 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [14]: %timeit n * 8
69.6 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [15]: %timeit n << 3
59.9 ns ± 0.655 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Update 1

I chose the my code structure based purely on maximizing of performance (shortest execution time), and all else are disposable and should be sacrificed for even small gains in making the code faster. All my choices in this program are based on empirical evidence.

In [41]: %%timeit
    ...: lst = [1] * 1025
    ...: m = 1
    ...: for i in range(1, 1025):
    ...:     lst[i] = m = m + m
93.3 μs ± 1.28 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [42]: %%timeit
    ...: m = 1
    ...: lst = [1, *((m := m + m) for _ in range(1024))]
129 μs ± 1.25 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [43]: %%timeit
    ...: m = 1
    ...: lst = [1, *[(m := m + m) for _ in range(1024)]]
100 μs ± 465 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [44]: %%timeit
    ...: m = 1
    ...: lst = [(m := m + m) for _ in range(1024)]
    ...: lst.insert(0, 1)
94.9 μs ± 975 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [45]: %%timeit
    ...: m = 1
    ...: lst = [(m := m * 2) for _ in range(1024)]
    ...: lst.insert(0, 1)
132 μs ± 1.55 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [46]: %timeit [1 << i for i in range(1025)]
94.5 μs ± 643 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [47]: %%timeit
    ...: lst = [1] * 1025
    ...: m = 1
    ...: for i in range(1, 1025):
    ...:     lst[i] = m = m + m
94 μs ± 919 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [48]: %%timeit
    ...: m = 1
    ...: [1] + [(m := m * 2) for _ in range(1024)]
139 μs ± 1.21 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [49]: %%timeit
    ...: m = 1
    ...: [1] + [(m := m + m) for _ in range(1024)]
101 μs ± 611 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The above lists eight different ways to generate a list containing the first 1025 powers of two (1 to \$2^{1024}\$). The execution time measures fluctuate, but generally the performance hierarchy doesn't change. The left-shift method and the addition and index assignment method both seem to be the fastest, but for large numbers, repeatedly shifting the big integers will be slow, there is a lot of extra work to be done.

I will explain each in detail.

In the first method, a list of length 1025 is created, initially filled with 1s. Then starting at index 1, each element is mutated by indexing assignment, the code first doubles multiple and assigns the result to multiple and the list at the given index. There is one single list throughout, no list.append calls and therefore no reallocation, it doesn't mutate the length of the list thus the memory access is contiguous, it is just pointer increments, therefore it is the fastest loop scheme.

The second method uses the same logic and operator, except it uses a generator, it doubles multiple at each iteration and reassigns that to multiple and yields the result at the same time, and then unpacks the generator and copies the contents of the yielded items to a list initialized with first item being 1. It has to first construct a generator, a generator is a Python object with all the overheads and it doesn't have a length attribute. The code has to first create a list and exhaust the generator and extend the list via list.extend which copies the values and mutates the length of the list so that memory access isn't contiguous. Therefore it is slow.

The third method is almost identical to the second, except it uses list comprehension instead of a generator. Generally a list comprehension is faster than the equivalent generator (changing [] to () without changing anything else), because it doesn't have Python object overheads. It has the same problem with the second. It has to create two lists and do bulk memory copying.

The fourth method creates a list using list comprehension and inserts 1 at position 0. It creates only one list, but it has to move all 1024 references down by one position to get one free space and then assign the first one. It has to copy all these elements one by one, and it has to move the starting position of the list if there isn't enough memory to store 1025 references because the position at start + 1024 is occupied by some other program. It has to mutate the length of the list therefore memory access isn't contiguous.

The fifth method is almost identical to the fourth, except it uses a multiplication instead of an addition. School book multiplication has time complexity of \$O(n^2)\$, but actually it is false. It is actually \$O(\lfloor \log_{base} n \rfloor + 1)^2\$, where base is the maximum digit plus one, \$\lfloor \log_{base} n \rfloor + 1\$ is the number of digits of n in base base numeral expansion, multiplication has to multiply every digit of the left operand with every digit of the right operand in the numeral expansions. For CPython the base is \$2^{30} = 1073741824\$, which fits inside a uint32_t. For large integers Python uses Karatsuba multiplication which has lower time complexity and also higher cost for small inputs, it multiplies two n-digit numbers in \$O(n^{\log_2 3})\$ time, which triggers in the example. For even larger numbers Python uses Number Theoretic Transform which has time complexity \$O(n \log n)\$ for n-digit multiplication but here the bit length is below the cutoff so it doesn't trigger. NTT is extremely slow for small inputs.

Now what is the time complexity of addition? Adding two n-digit integers can be done in n iterations, so it is \$O(n)\$ where n is the number of digits or \$\lfloor \log_{base} N \rfloor + 1\$ for input N. So for adding one integer with itself, n + n is strictly faster than n * 2, because n < n log n. Of course for adding n with itself m times n * (m + 1) for m > 1 is strictly faster than chain addition, because they have the same time complexity and the former has far fewer function calls and overheads.

The sixth method is almost as fast as the first, but the timings are unreliable. It uses list comprehension to generate a single list containing 1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5... 1<<1024. For an n-digit number left-shift is \$O(n)\$ in time, it has to propagate the carry bit from lowest digit to highest digit (+ 1 digit if highest bit is set) in one single loop, so it is strictly faster than multiplication for multiplying with powers of 2. For some reason n << 1 is slower than n + n. The sixth method does a whole lot of redundant work because each immediate result can be obtained by a single operation from the previous result. So it is strictly slower than the first.

The eighth and nineth generates a list using list comprehension and concatenates it with another list. It creates two lists and then creates another list and copies the contents of the two lists to the third list, so they are the slowest.

\$\endgroup\$
1
  • 3
    \$\begingroup\$ “I have implemented integration by parts” – The term “integration by parts” is usually used for this integration technique. \$\endgroup\$ Commented Feb 23 at 19:53

4 Answers 4

5
\$\begingroup\$

A couple of general notes, not specifically related to performance.

Unnecessary nested conditionals

Any time you see a pattern like the following, where a conditional and nothing else is located within the else branch of another conditional, you're looking for elif.

    for i, j, k in strats["add"]:
        if j != k:
            result.append(
                f"    mults_2[{i}] = "
                + (f"mults_2[{j}]" if j else "term_2")
                + " + "
                + (f"mults_2[{k}]" if k else "term_2")
            )
        else:
            if not j:
                result.append(f"    mults_2[{i}] += term_2")

            else:
                result.append(f"    mults_2[{i}] = (val := mults_2[{j}]) + val")

Revised:

    for i, j, k in strats["add"]:
        if j != k:
            result.append(
                f"    mults_2[{i}] = "
                + (f"mults_2[{j}]" if j else "term_2")
                + " + "
                + (f"mults_2[{k}]" if k else "term_2")
            )
        elif not j:
            result.append(f"    mults_2[{i}] += term_2")
        else:
            result.append(f"    mults_2[{i}] = (val := mults_2[{j}]) + val")

Creating lists

You have a few chunks of code like the following. Let's structure this using a list comprehension and the walrus operator to handle the modification to multiple on each loop, since initializing each element but the first is redundant as you'll overwrite them anyway.

Note:

  • The i now becomes extraneous.
  • multiple + multiple can be written multiple * 2
    level_up = level + 1
    multiple = start
    starts = [start] * level_up
    for i in range(1, level_up):
        starts[i] = multiple = multiple + multiple

Revised:

    level_up = level + 1
    multiple = start
    starts = [
        start, 
        *[(multiple := multiple * 2) for _ in range(1, level_up]
    ]
\$\endgroup\$
4
\$\begingroup\$

Unused code

In the assemble_strategy function, this line can be deleted because the l variable is not used:

l = len(mults)

Naming

In the generate_source_Newton function, you should avoid l as the name of a variable:

l = len(den_mults)

It is easily confused with the number 1 and capital "I". And in this context, it is not a meaningful name. Something like len_den_mults would be better.

Layout

This looks strange on 3 lines:

FUNCTION_SOURCES[f"Newton_Cotes_integrate_{i+3}"] = source = generate_source_Newton(
    i
)

One looks more typical:

FUNCTION_SOURCES[f"Newton_Cotes_integrate_{i+3}"] = source = generate_source_Newton(i)

DRY

I see patterns like this in a few places:

starts[i] = multiple = multiple + multiple

This is simpler as:

starts[i] = multiple = 2 * multiple

Simpler

Making multiple assignments in a single line can be hard to understand:

step_num, den = high_num * low_den - start, low_den * high_den

It is much easier to read and maintain as separate lines:

step_num = high_num * low_den - start
den = low_den * high_den
\$\endgroup\$
3
  • \$\begingroup\$ Assigning step_num and den probably works even better if we assign den first and then step_num = den - start \$\endgroup\$ Commented Feb 23 at 21:01
  • \$\begingroup\$ That chain assignment with three lines and i in the middle is the result of Black formatter. \$\endgroup\$ Commented Feb 24 at 3:14
  • \$\begingroup\$ I strongly think two lines would be easier to read: source = generate_source_Newton(i); FUNCTION_SOURCES[f"Newton_Cotes_integrate_{i+3}"] = source. The Black formatter did what it could to make the line shorter, because it detected that the line was way too long. But it would be better to split it into two lines yourself. \$\endgroup\$ Commented Feb 25 at 10:23
2
\$\begingroup\$

How can I improve their convergence rate and also reduce execution time simultaneously?

With multiprocessing. Each parenthesized expression can be processed in parallel.

Also:

Now because I don't use GCD (which will slow down the code)

This is only true in single threaded environments. In this case, when you are GCDing you are not multiplying or progressing other code. But if GCD is done outside the "main" thread?

If the GCD is relatively fast, then after each fraction is created by naively cross multiplication, it is piped into a single Process that does the GCD and returns the simplified fraction. Continue the code that progresses the formula, creating other unsimplified fractions.

When any unsimplified fraction is about to be used, it is now possible to inspect the pipe to see if the unsimplified fraction got simplified, and use it.

If the code is organized to always create and consume the fractions in the same order, then the replacement of fractions can be done in O(1) time.

\$\endgroup\$
0
\$\begingroup\$

Thanks for the answers, but they don't improve the performance so I won't accept them.

And I have discovered three tricks that can drastically increase the performance of the program and I have also implemented two minor optimizations.

I will explain each in detail.

First, the simplest optimization I made, as some integrals start at the origin, it doesn't make sense to build the multiples of the numerator of the start times the denominator of the end (low_num * high_den in my code), as it is 0 if low_num is 0, and all its multiples are 0, there is a lot of redundant computation, so I made it a special case and I made the code compile dedicated functions just for them.

Second, I have added the option to reduce the starting fractions to the lowest terms (making the fractions coprime pairs), the idea is simple, I add the fractions using \$\frac{a}{b} + \frac{c}{d} = \frac{ad + bc}{bd}\$, as you can see both b and d cannot be 0, and a and c are likely positive, which means in general the numerator and denominator of the sum will be larger than the original fractions, and the rate of increase is extremely fast, the larger the numbers the faster the growth rate, and adding many fractions together will make the numbers explode.

Now, the elementary arithmetic functions used here: addition and multiplication, both have growing cost as the numbers get larger, the larger the numbers the more time they will take, with multiplication scale as the square of the number of digits.

So I made the code to generate functions that reduce the end points and the common denominator up front, and I used 3 GCD calls and 7 floor divisions, they do have some cost, but compared to the cost of arithmetic on large numbers their costs are small, and they ensure the starting values are as small as possible, which means the numbers won't grow too fast.

Now, the first trick I used, is this, instead of adding the fractions one by one linearly, I split the list of fractions into pairs, and add the fractions in each pair, and collect the resulting fractions and carry fraction if the count of starting fractions is odd, this halves the number of fractions each iteration, the output of each iteration is the input of the next iteration.

Eventually all the fractions will be added and there will be only one fraction in the stack, and so the loop terminates. This method is much faster than bruteforce, for more information, see this answer.

There are two main reasons that the second method is faster, in the first method it is always a big number multiplied with a small number, the big numbers grow in a factorial like fashion. In the second method, the fractions are added in pairs, which means the numbers will be smaller than if three and more fractions are added together, this much should be obvious. So the numbers stay smaller for longer, thus operations on them remain faster for longer.

The second reason is this, in the second method as the fractions are added in pairs, in each iteration all the fractions will be comparable in size. Why does this matter? Because Python multiplies digit by digit, CPython uses 30 bits in a digit, so a digit can hold \$2^{30} = 1073741824\$ values, and for large numbers Python uses Karatsuba multiplication which multiplies two n-digit numbers in \$O(n^{\log_2(3)})\$ time, and CPython uses Karatsuba multiplication when both operands are at least 70 digits long (2100 bits), whereas for cases where at least one of the operands is less than 70 digits long, Python uses schoolbook multiplication which has time complexity of \$O(n^2)\$, maybe it doesn't seem much, but it is exponentiation which means it grows extremely fast, so Karatsuba multiplication will be much faster than schoolbook multiplication as the input get larger. For example, \$70^2 = 4900, 70^{\log_2(3)} \approx 840.2551898497885\$, the difference is huge.

Proof:

import numpy as np
from itertools import batched
from math import gcd


def make_test_data(num_digits: int, num_fractions: int) -> list[tuple[int, int]]:
    result = []
    data = np.random.randint(
        0x20000000, 0x40000000, (num_fractions, num_digits << 1), dtype=np.uint32
    )
    for row in data:
        pair = [0, 0]
        for i, e in enumerate(row.tolist()):
            pair[which] = (pair[which := i < num_digits] << 30) | e

        result.append(pair)

    return result


def test1(fracs: list[tuple[int, int]]) -> tuple[int, int]:
    num, den = 0, 1
    for tnum, tden in fracs:
        num, den = num * tden + tnum * den, den * tden

    return num, den


def test2(fracs: list[tuple[int, int]]) -> tuple[int, int]:
    while (length := len(fracs)) > 1:
        carry = fracs.pop(-1) if length & 1 else None
        fracs = [(a * d + b * c, b * d) for (a, b), (c, d) in batched(fracs, 2)]
        if carry:
            fracs.append(carry)

    return fracs[0]
In [46]: data = make_test_data(16, 16)

In [47]: %timeit test1(data)
126 μs ± 811 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [48]: %timeit test2(data)
129 μs ± 763 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [49]: data = make_test_data(32, 16)

In [50]: %timeit test1(data)
449 μs ± 9.22 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [51]: %timeit test2(data)
400 μs ± 7.51 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [52]: data = make_test_data(64, 16)

In [53]: %timeit test1(data)
1.71 ms ± 10.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [54]: %timeit test2(data)
1.24 ms ± 22.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [55]: data = make_test_data(16, 32)

In [56]: %timeit test1(data)
484 μs ± 14.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [57]: %timeit test2(data)
431 μs ± 14.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [58]: data = make_test_data(16, 64)

In [59]: %timeit test1(data)
1.87 ms ± 56.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [60]: %timeit test2(data)
1.39 ms ± 12.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

As you can see, test2 is increasingly faster than test1 as the number of digits increase and the number of fractions increase, though for trivial cases test1 is faster, so I only use the method used in test2 in the integrate_by_parts function (and yes I am aware that it differs from school definitions, and the method is better described as "composite rule", but I don't care about academia and I call it integrate by parts because it splits the interval into many parts and integrates over each and then sum).

The second trick is much easier to follow. So we all know multiplying the numerator and denominator with some number simultaneously doesn't change the value of the fraction. Now, division is multiplication, division is multiplication with the multiplicative inverse element. And, as the fractions get larger, the cost of adding the fractions together will also get larger even faster, and the cost will only increase as we add the fractions together, so naturally if we divide both sides by some number the numerator and denominator will be smaller and thus arithmetic on them will be faster again.

Thus naturally we should find the greatest common denominator of the numerator and denominator and divide both sides by that number, right? WRONG! GCD is slow, and division is slow, we need one GCD and two floor divisions every iteration if we reduce the fraction every iteration, and the code will be much slower than before.

Instead we can do this, obviously a fraction with an even numerator and an even denominator isn't reduced, they have common factors of 2, a fraction with both sides odd isn't guaranteed to be reduced either, but a reduced fraction must not have both sides even, and from both sides even to at least one side odd we divide by a power of two on both sides, thus the numbers are smaller.

Now computers use binary, it is just ones and zeros, each place represents a power of two. For example, 9460536207068016 is \$100001100111000100111010111100000000011100101101110000_2\$ in binary, and the number of times two appears in a number's prime factorization (the exponent of the maximum power of two that divides a number) is simply the number of trailing zeros in a number's binary expansion, for example the number given is \$2^{53} + 2^{48} + 2^{47} + 2^{44} + 2^{43} + 2^{42} + 2^{38} + 2^{35} + 2^{34} + 2^{33} + 2^{31} + 2^{29} + 2^{28} + 2^{27} + 2^{26} + 2^{16} + 2^{15} + 2^{14} + 2^{11} + 2^{9} + 2^{8} + 2^{6} + 2^{5} + 2^{4}\$, the polynomial is what that binary expansion means, the polynomial can be expressed more efficiently as \$2^{53} + 2^{49} - 2^{47} + 2^{45} - 2^{42} + 2^{38} + 2^{36} - 2^{33} + 2^{31} + 2^{30} - 2^{26} + 2^{17} - 2^{14} + 2^{11} + 2^{10} - 2^{8} + 2^{7} - 2^{4}\$ but I digress.

To get the number of trailing zeros in Python, use (n ^ n - 1).bit_length() - 1 (subtraction has a higher precedence so it is evaluated first), for a number with n trailing zeros, after subtraction the last n bits will be 1, and the next higher bit will be onset because of the borrow, previously it was set by definition, therefore these will be the only bits that differ, and XOR is true when operands are different.

We take the minimum of the number of trail zeros of the numerator and denominator, this is the exponent of the largest power of two that divides both, to get rid of it we simply right shift both sides by that amount. This preserves the exact fraction and makes the numbers smaller at very little cost.

The third trick is like magic. I don't fully understand why it works. But I can describe it in different ways. As we all know, if we divide both sides of a fraction by the same amount, the result doesn't change. But what if we do integer division on both sides, what if we divide both sides by an integer that isn't a common factor and get rid of the remainders? Well, we obviously get a new fraction, with the quotients as the new numerator and new denominator, but how different is the new fraction from the original?

In [102]: import gmpy2

In [103]: pi = gmpy2.const_pi(333)

In [104]: gmpy2.get_context().precision = 333

In [105]: n, d = pi.as_integer_ratio()

In [106]: n/d
Out[106]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706807',333)

In [107]: (n>>1)/(d>>1)
Out[107]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706784',333)

In [108]: (n>>2)/(d>>2)
Out[108]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706738',333)

In [109]: (n>>4)/(d>>4)
Out[109]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706738',333)

In [110]: (n>>8)/(d>>8)
Out[110]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211701252',333)

In [111]: (n>>16)/(d>>16)
Out[111]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534210595206',333)

In [112]: (n>>32)/(d>>32)
Out[112]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482484887449405',333)

In [113]: (n>>64)/(d>>64)
Out[113]: mpfr('3.14159265358979323846264338327950288419716939937510582097494459230781640628620899755636516885771225893',333)

In [114]: for i in range(1, 9):
     ...:     shift = 1 << i
     ...:     print(shift, int(-gmpy2.log10(abs((n>>shift)/(d>>shift) - pi))))
2 99
4 99
8 97
16 95
32 90
64 81
128 61
256 23

In [115]: 8 * gmpy2.log10(2)
Out[115]: mpfr('2.40823996531184956170991115779594421414551905169686833048341968901686551419539607589541801694548937642',333)

In [116]: 16 * gmpy2.log10(2)
Out[116]: mpfr('4.81647993062369912341982231559188842829103810339373666096683937803373102839079215179083603389097875284',333)

In [117]: 32 * gmpy2.log10(2)
Out[117]: mpfr('9.63295986124739824683964463118377685658207620678747332193367875606746205678158430358167206778195750567',333)

In [118]: 64 * gmpy2.log10(2)
Out[118]: mpfr('19.2659197224947964936792892623675537131641524135749466438673575121349241135631686071633441355639150113',333)

If we right shift both sides by n bits, then in general we lose the last \$n \ \log_{10}(2)\$ digits. But all the other digits stay the same, they don't change.

Why is it so?

Well, we can approach this in different ways. First, we know \$\frac{a}{b} = \frac{2^na}{2^nb}\$, left-shifting both sides by n equals multiplying both sides by \$2^n\$. Conversely right-shifting both sides by n equals dividing both sides by the same amount. Now add a tiny number to the numerator after multiplying both sides by the power of two, this number is smaller than the power of two. Obviously the difference will be small, as the number is small and it is divided by a huge number, and the numerator is dominated by a. Similarly add some small number to the denominator, the difference will be small because the denominator is dominated by b.

Another way to look at this, is to think of the fraction not as a number, but as a convergent series. We have this series: \$\frac{N_0 \cdot 2^n + N_1 \cdot 2^{n-1} + N_2 \cdot 2^{n-2} + N_3 \cdot 2^{n-3} + N_4 \cdot 2^{n-4} + N_5 \cdot 2^{n-5}...}{D_0 \cdot 2^d + D_1 \cdot 2^{d-1} + D_2 \cdot 2^{d-2} + D_3 \cdot 2^{d-3} + D_4 \cdot 2^{d-4} + D_5 \cdot 2^{d-5}...}\$, in which capital N with subscript means the bits of the numerator, and capital D for the denominator bits, little n and little d are for the highest bit positions. As you can clearly see, the sums are dominated by the higher bits because they are multiplied with huge numbers, the more to the right a term is, the smaller its overall contribution to the sums, and so they don't make much of a difference, but the more terms there are, the more precision the partial sum is, therefore the later terms change the last few digits.

Thus if I set a fixed number of bits, and I truncate every arithmetic result to the highest number of bits, the numbers will never blow up and I get \$\log_{10}(2)\$ times the precision number of accurate decimal digits at minimum, this can ensure the calculations stay fast no matter how many fractions I add. And as a bonus, this beats floats, because if the numerator and denominator are both below the limit, then the fraction is exact, unlike in float in which every fraction with reduced denominator containing any prime factor other than 2 gets infinite binary expansion thus only approximations can be stored.

Code:

from itertools import batched, islice
from math import ceil, gcd
from gmpy2 import atan2, get_context, log10, mpfr, set_exp

NEWTON_COTES_FORMULAS = [
    {
        "steps": 3,
        "scale": (3, 8),
        "extrema": 1,
        "middle": {3: [(1, 3), (2, 3)]},
    },
    {
        "steps": 4,
        "scale": (2, 45),
        "extrema": 7,
        "middle": {32: [(1, 4), (3, 4)], 12: [(1, 2)]},
    },
    {
        "steps": 5,
        "scale": (5, 288),
        "extrema": 19,
        "middle": {75: [(1, 5), (4, 5)], 50: [(2, 5), (3, 5)]},
    },
    {
        "steps": 6,
        "scale": (1, 140),
        "extrema": 41,
        "middle": {216: [(1, 6), (5, 6)], 27: [(1, 3), (2, 3)], 272: [(1, 2)]},
    },
    {
        "steps": 7,
        "scale": (7, 17280),
        "extrema": 751,
        "middle": {
            3577: [(1, 7), (6, 7)],
            1323: [(2, 7), (5, 7)],
            2989: [(3, 7), (4, 7)],
        },
    },
    {
        "steps": 8,
        "scale": (4, 14175),
        "extrema": 989,
        "middle": {
            5888: [(1, 8), (7, 8)],
            -928: [(1, 4), (3, 4)],
            10496: [(3, 8), (5, 8)],
            -4540: [(1, 2)],
        },
    },
    {
        "steps": 9,
        "scale": (9, 89600),
        "extrema": 2857,
        "middle": {
            15741: [(1, 9), (8, 9)],
            1080: [(2, 9), (7, 9)],
            19344: [(1, 3), (2, 3)],
            5778: [(4, 9), (5, 9)],
        },
    },
    {
        "steps": 10,
        "scale": (5, 299376),
        "extrema": 16067,
        "middle": {
            106300: [(1, 10), (9, 10)],
            -48525: [(1, 5), (4, 5)],
            272400: [(3, 10), (7, 10)],
            -260550: [(2, 5), (3, 5)],
            427368: [(1, 2)],
        },
    },
]


def generate_indices(formula: dict) -> dict:
    num_mults = set()
    den_mults = set()
    middle = formula["middle"]
    for points in middle.values():
        num, den = points[0]
        num_mults.add(num)
        den_mults.add(den)
        if len(points) > 1:
            num, den = points[1]
            num_mults.add(num)
            den_mults.add(den)

    num_mults = sorted(num_mults)
    den_mults = sorted(den_mults)
    num_indices = {e: i for i, e in enumerate(num_mults)}
    if len(den_mults) == 1 and den_mults[0] > 2:
        return {
            "num_mults": num_mults,
            "den_mults": den_mults,
            "indices": [
                (weight, num_indices[a], num_indices[b])
                for weight, ((a, _), (b, _)) in middle.items()
            ],
            "middle": None,
        }

    den_indices = {e: i for i, e in enumerate(den_mults)}
    indices = [
        (weight, [(num_indices[num], den_indices[den]) for num, den in points])
        for weight, points in middle.items()
    ]
    weight = None
    if den_mults[0] == 2:
        weight, _ = indices.pop(-1)

    return {
        "num_mults": num_mults,
        "den_mults": den_mults,
        "indices": indices,
        "middle": weight,
    }


def assemble_strategy(mults: list[int]) -> dict:
    strategy = {"add": [], "mul": []}
    if (e := mults[0]) == 1:
        diffs = [b - a for a, b in zip(mults, mults[1:])]
        if len(set(diffs)) == 1:
            return {"type": "range", "step": diffs[0], "increments": len(diffs)}
    else:
        strategy["mul"].append((0, e))

    for i, e in enumerate(mults[1:], start=1):
        found = False
        for j, f in enumerate(mults[:i]):
            for k, g in enumerate(mults[j:i], start=j):
                if f + g == e:
                    strategy["add"].append((i, j, k))
                    found = True
                    break
            if found:
                break
        else:
            strategy["mul"].append((i, e))

    return {"type": "general", "length": len(mults), "strategy": strategy}


INITIALIZE_TYPE_0 = """
    left = low_num * high_den
    span = high_num * low_den - left
    divisor = low_den * high_den
"""

INITIALIZE_TYPE_1 = """
    low_num //= (common := gcd(low_num, low_den))
    low_den //= common
    high_num //= (common := gcd(high_num, high_den))
    high_den //= common
    left = low_num * high_den // (common := gcd(low_den, high_den))
    span = high_num * low_den // common - left
    divisor = low_den * high_den // common
"""


INITIALIZE_TYPE_2 = """
    span = high_num
    divisor = high_den
"""

INITIALIZE_TYPE_3 = """
    span = high_num // (common := gcd(high_num, high_den))
    divisor = high_den // common
"""


FUNCTION_TEMPLATE_0 = """def {func_name}(
    low_num: int, low_den: int, high_num: int, high_den: int, integrand: callable
) -> tuple[int, int]:{initialize}
{assemble}
    result_num, result_den = integrand(low_num, low_den)
    term_num, term_den = integrand(high_num, high_den)
    result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
    result_num {extrema_weight}
    indices = NEWTON_COTES_MIDDLE_INDICES[{indices_length}]{loop_body}{extra_iteration}
    return result_num * span {num_scale}, result_den * {last_den} {den_scale}
"""


FUNCTION_TEMPLATE_1 = """def {func_name}(
    high_num: int, high_den: int, integrand: callable
) -> tuple[int, int]:{initialize}
{assemble}
    result_num, result_den = integrand(0, 1)
    term_num, term_den = integrand(high_num, high_den)
    result_num, result_den = result_num * term_den + term_num * result_den, result_den * term_den
    result_num {extrema_weight}
    indices = NEWTON_COTES_MIDDLE_INDICES[{indices_length}]{loop_body}{extra_iteration}
    return result_num * span {num_scale}, result_den * {last_den} {den_scale}
"""


FUNCTION_MAINLOOP = """
    for weight, ((x1, y), (x2, _)) in indices:
        term_num, term_den = integrand((start := mults_left[y]) + mults_span[x1], (den := mults_div[y]))
        next_num, next_den = integrand(start + mults_span[x2], den)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        result_num, result_den = result_num * term_den + term_num * result_den * weight, result_den * term_den
"""

FUNCTION_MAINLOOP_1 = """
    for weight, ((x1, y), (x2, _)) in indices:
        term_num, term_den = integrand(mults_span[x1], (den := mults_div[y]))
        next_num, next_den = integrand(mults_span[x2], den)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        result_num, result_den = result_num * term_den + term_num * result_den * weight, result_den * term_den
"""


FUNCTION_ALTLOOP = """
    for weight, x1, x2 in indices:
        term_num, term_den = integrand(max_left + mults_span[x1], max_div)
        next_num, next_den = integrand(max_left + mults_span[x2], max_div)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        result_num, result_den = result_num * term_den + term_num * result_den * weight, result_den * term_den
"""


FUNCTION_ALTLOOP_1 = """
    for weight, x1, x2 in indices:
        term_num, term_den = integrand(mults_span[x1], max_div)
        next_num, next_den = integrand(mults_span[x2], max_div)
        term_num, term_den = term_num * next_den + next_num * term_den, term_den * next_den
        result_num, result_den = result_num * term_den + term_num * result_den * weight, result_den * term_den
"""


FUNCTION_LAST_ITERATION = """
    term_num, term_den = integrand(mults_left[0] + span, mults_div[0])
    result_num, result_den = result_num * term_den + term_num * result_den {weight}, result_den * term_den
"""

FUNCTION_LAST_ITERATION_1 = """
    term_num, term_den = integrand(span, mults_div[0])
    result_num, result_den = result_num * term_den + term_num * result_den {weight}, result_den * term_den
"""


def mul_or_lshift(n: int) -> str:
    return f"* {n}" if n & -n != n else f"<< {n.bit_length() - 1}"


def imul_or_ilshift(n: int) -> str:
    return f"*= {n}" if n & -n != n else f"<<= {n.bit_length() - 1}"


def assemble_numerator_multiples(strategy: dict) -> list:
    if strategy["type"] == "range":
        return [
            (
                "    step = span"
                if (step := strategy["step"]) == 1
                else (
                    "    step = span + span"
                    if step == 2
                    else f"    step = span {mul_or_lshift(step)}"
                )
            ),
            "    number = span",
            f"    mults_span = [span] * {strategy['increments']+1}\n    for i in range(1, {strategy['increments']+1}):\n        mults_span[i] = number = number + step\n",
        ]

    result = [f"    mults_span = [span] * {strategy['length']}"]
    strats = strategy["strategy"]
    for a, b in strats["mul"]:
        result.append(f"    mults_span[{a}] {imul_or_ilshift(b)}")

    for i, j, k in strats["add"]:
        if j != k:
            result.append(
                f"    mults_span[{i}] = "
                + (f"mults_span[{j}]" if j else "span")
                + " + "
                + (f"mults_span[{k}]" if k else "span")
            )
        elif not j:
            result.append(f"    mults_span[{i}] += span")

        else:
            result.append(f"    mults_span[{i}] = (val := mults_span[{j}]) + val")

    return result


def assemble_denominator_multiples(strategy: dict) -> dict:
    strats = strategy["strategy"]
    if strategy["length"] == 1:
        mult = strats["mul"][0][1]
        return {
            "left": [f"    max_left = left {(op := mul_or_lshift(mult))}"],
            "div": [f"    max_div = divisor {op}"],
        }

    result = {
        "left": [f"    mults_left = [left] * {(length := strategy['length'])}"],
        "div": [f"    mults_div = [divisor] * {length}"],
    }

    for a, b in strats["mul"]:
        result["left"].append(f"    mults_left[{a}] {(op := imul_or_ilshift(b))}")
        result["div"].append(f"    mults_div[{a}] {op}")

    for i, j, k in strats["add"]:
        if j != k:
            result["left"].append(
                f"    mults_left[{i}] = mults_left[{j}] + mults_left[{k}]"
            )
            result["div"].append(
                f"    mults_div[{i}] = mults_div[{j}] + mults_div[{k}]"
            )
        else:
            result["left"].append(
                f"    mults_left[{i}] = (val := mults_left[{j}]) + val"
            )
            result["div"].append(f"    mults_div[{i}] = (val := mults_div[{j}]) + val")

    return result


def strip(a: int, b: int, c: int) -> tuple[int, int]:
    return (
        a
        >> (
            tail := (
                (t1 := (a ^ a - 1).bit_length() - 1),
                (t2 := (b ^ b - 1).bit_length() - 1),
            )[t1 > t2]
        ),
        b >> tail,
    )


def truncate(a: int, b: int, max_bits: int) -> tuple[int, int]:
    return (
        a
        >> (
            shift := (
                (
                    overflow := (
                        diff := ((l1 := a.bit_length()), (l2 := b.bit_length()))[
                            l1 > l2
                        ]
                        - max_bits
                    )
                    * (diff > 0)
                ),
                (
                    tail := (
                        (t1 := (a ^ a - 1).bit_length() - 1),
                        (t2 := (b ^ b - 1).bit_length() - 1),
                    )[t1 > t2]
                ),
            )[overflow < tail]
        ),
        b >> shift,
    )


SHORTEN = (strip, truncate)


NAMESPACE = {
    "NEWTON_COTES_MIDDLE_INDICES": [],
    "ROMBERG_WEIGHTS": [],
    "gcd": gcd,
    "batched": batched,
    "mpfr": mpfr,
    "set_exp": set_exp,
    "SHORTEN": SHORTEN,
}


def generate_source_Newton(level: int) -> list[tuple[str, str]]:
    assert isinstance(level, int) and 0 <= level <= 7
    base_fname = f"Newton_Cotes_integrate_{level+3}"
    formula = NEWTON_COTES_FORMULAS[level]
    data = generate_indices(formula)
    lst = NAMESPACE["NEWTON_COTES_MIDDLE_INDICES"]
    length = len(lst)
    lst.append(data["indices"])
    den_mults = data["den_mults"]
    num_scale, den_scale = formula["scale"]
    l = len(den_mults)
    special_assembly = assemble_numerator_multiples(
        assemble_strategy(data["num_mults"])
    )
    denominator_assembly = assemble_denominator_multiples(assemble_strategy(den_mults))
    special_assembly.extend(denominator_assembly["div"])
    general_assembly = special_assembly + denominator_assembly["left"]
    special_assembly = "\n".join(special_assembly)
    general_assembly = "\n".join(general_assembly)
    base_args = {
        "num_scale": mul_or_lshift(num_scale),
        "den_scale": mul_or_lshift(den_scale),
        "last_den": "max_div" if l == 1 else "mults_div[-1]",
        "extrema_weight": imul_or_ilshift(formula["extrema"]),
        "indices_length": length,
    }
    if l > 1:
        loop_body0 = FUNCTION_MAINLOOP
        loop_body1 = FUNCTION_MAINLOOP_1
    else:
        loop_body0 = FUNCTION_ALTLOOP
        loop_body1 = FUNCTION_ALTLOOP_1
    if middle := data["middle"]:
        middle_weight = mul_or_lshift(middle)
        extra_iteration0 = FUNCTION_LAST_ITERATION.format(weight=middle_weight)
        extra_iteration1 = FUNCTION_LAST_ITERATION_1.format(weight=middle_weight)
    else:
        extra_iteration0 = extra_iteration1 = ""
    return [
        (
            (fname_0 := f"{base_fname}_type_0"),
            FUNCTION_TEMPLATE_0.format_map(
                base_args
                | {
                    "func_name": fname_0,
                    "initialize": INITIALIZE_TYPE_0,
                    "assemble": general_assembly,
                    "loop_body": loop_body0,
                    "extra_iteration": extra_iteration0,
                }
            ),
        ),
        (
            (fname_1 := f"{base_fname}_type_1"),
            FUNCTION_TEMPLATE_0.format_map(
                base_args
                | {
                    "func_name": fname_1,
                    "initialize": INITIALIZE_TYPE_1,
                    "assemble": general_assembly,
                    "loop_body": loop_body0,
                    "extra_iteration": extra_iteration0,
                }
            ),
        ),
        (
            (fname_2 := f"{base_fname}_type_2"),
            FUNCTION_TEMPLATE_1.format_map(
                base_args
                | {
                    "func_name": fname_2,
                    "initialize": INITIALIZE_TYPE_2,
                    "assemble": special_assembly,
                    "loop_body": loop_body1,
                    "extra_iteration": extra_iteration1,
                }
            ),
        ),
        (
            (fname_3 := f"{base_fname}_type_3"),
            FUNCTION_TEMPLATE_1.format_map(
                base_args
                | {
                    "func_name": fname_3,
                    "initialize": INITIALIZE_TYPE_3,
                    "assemble": special_assembly,
                    "loop_body": loop_body1,
                    "extra_iteration": extra_iteration1,
                }
            ),
        ),
    ]


FUNCTION_SOURCES = {}
for i in range(8):
    for fname, source in generate_source_Newton(i):
        exec(compile(source, "<dynamic>", "exec"), NAMESPACE)
        FUNCTION_SOURCES[fname] = source


def integrate_by_parts(
    low_num: int,
    low_den: int,
    high_num: int,
    high_den: int,
    level: int,
    integrand: callable,
    integrator: callable,
    max_digits: int = None,
    reduce: bool = False,
) -> tuple[int, int]:
    max_bits = max_digits * 30 if max_digits else 0
    shorten = SHORTEN[bool(max_digits)]
    low_num //= (common := gcd(low_num, low_den))
    low_den //= common
    high_num //= (common := gcd(high_num, high_den))
    high_den //= common
    left = low_num * high_den // (common := gcd(low_den, high_den))
    span = high_num * low_den // common - left
    divisor = low_den * high_den // common
    total = 1 << level
    points = [(low_num, low_den)] * (total + 1)
    points[-1] = (high_num, high_den)
    level_up = level + 1
    multiple = left
    starts = [left] * level_up
    for i in range(1, level_up):
        starts[i] = multiple = multiple + multiple

    multiple = divisor
    denominators = [divisor] * level_up
    for i in range(1, level_up):
        denominators[i] = multiple = multiple + multiple

    step = span + span
    multiple = span
    lim = total >> 1
    odds = [span] * lim
    for i in range(1, lim):
        odds[i] = multiple = multiple + step

    power = 1
    powers = [1] * level_up
    for i in range(1, level_up):
        powers[i] = power = power + power

    for index, start, step in zip(range(level, 0, -1), powers, powers[1:]):
        initial = starts[index]
        denominator = denominators[index]
        for i, odd in zip(range(start, total, step), odds):
            points[i] = (initial + odd, denominator)

        del odds[lim:]
        lim >>= 1

    result_num, result_den = 0, 1
    it = iter(points)
    next(it)
    fracs = [
        integrator(num1, den1, num2, den2, integrand)
        for (num1, den1), (num2, den2) in zip(points, it)
    ]
    while (length := len(fracs)) > 1:
        carry = fracs.pop(-1) if length & 1 else None
        fracs = [
            shorten(a * d + b * c, b * d, max_bits)
            for (a, b), (c, d) in batched(fracs, 2)
        ]
        if carry:
            fracs.append(carry)

    result_num, result_den = fracs[0]

    if reduce:
        common = gcd(result_num, result_den)
        result_num //= common
        result_den //= common

    return result_num, result_den


def arctan_derivative(x: int, y: int) -> tuple[int, int]:
    return y * y, x * x + y * y


def arctan_Newton(x: int, y: int, order: int) -> tuple[int, int]:
    assert 3 <= order <= 10
    return NAMESPACE[f"Newton_Cotes_integrate_{order}_type_3"](x, y, arctan_derivative)


def arctan_parts(
    x: int, y: int, order: int, level: int, max_digits: int = None
) -> tuple[int, int]:
    assert 3 <= order <= 10
    return integrate_by_parts(
        0,
        1,
        x,
        y,
        level,
        arctan_derivative,
        NAMESPACE[f"Newton_Cotes_integrate_{order}_type_0"],
        max_digits,
    )

Test:

In [48]: n, d = arctan_parts(123, 456, 4, 6); n/d
Out[48]: 0.26346654103491746

In [49]: math.atan2(123, 456)
Out[49]: 0.26346654103491746

In [50]: n, d = arctan_parts(123, 456, 4, 6, 2); n/d
Out[50]: 0.26346654103491746

In [51]: n, d = arctan_parts(123, 456, 5, 5, 2); n/d
Out[51]: 0.26346654103491746

In [52]: n, d = arctan_parts(123, 456, 6, 4, 2); n/d
Out[52]: 0.26346654103491746

In [53]: n, d = arctan_parts(123, 456, 7, 3, 2); n/d
Out[53]: 0.26346654103491746

In [54]: n, d = arctan_parts(123, 456, 8, 2, 2); n/d
Out[54]: 0.26346654103491746

In [55]: n, d = arctan_parts(123, 456, 9, 2, 2); n/d
Out[55]: 0.26346654103491746

In [56]: n, d = arctan_parts(123, 456, 10, 1, 2); n/d
Out[56]: 0.26346654103491746

In [57]: %timeit arctan_parts(123, 456, 4, 6)
655 μs ± 11.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [58]: %timeit arctan_parts(123, 456, 4, 6, 2)
398 μs ± 15.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [59]: %timeit arctan_parts(123, 456, 4, 7, 2)
778 μs ± 7.98 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [60]: %timeit arctan_parts(123, 456, 4, 8, 2)
1.7 ms ± 20.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [61]: %timeit arctan_parts(123, 456, 4, 8)
4.77 ms ± 92.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [62]: %timeit arctan_parts(123, 456, 4, 7)
1.67 ms ± 12.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [63]: %timeit arctan_parts(123, 456, 5, 5)
315 μs ± 18.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [64]: %timeit arctan_parts(123, 456, 5, 5, 2)
224 μs ± 9.12 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [65]: %timeit arctan_parts(123, 456, 6, 4)
167 μs ± 4.52 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [66]: %timeit arctan_parts(123, 456, 6, 4, 2)
131 μs ± 1.36 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [67]: %timeit arctan_parts(123, 456, 7, 3)
76.4 μs ± 1.17 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [68]: %timeit arctan_parts(123, 456, 7, 3, 2)
70.2 μs ± 1.35 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [69]: %timeit arctan_parts(123, 456, 8, 2)
44 μs ± 1.89 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [70]: %timeit arctan_parts(123, 456, 8, 2, 2)
41.7 μs ± 1.61 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [71]: %timeit arctan_parts(123, 456, 9, 2)
45.5 μs ± 1.25 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [72]: %timeit arctan_parts(123, 456, 9, 2, 2)
44.3 μs ± 1.29 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [73]: %timeit arctan_parts(123, 456, 10, 1)
23.7 μs ± 1.01 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [74]: %timeit arctan_parts(123, 456, 10, 1, 2)
24.2 μs ± 1.08 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [75]: %timeit arctan_parts(123, 456, 10, 2)
50.1 μs ± 1.3 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [76]: %timeit arctan_parts(123, 456, 10, 3)
110 μs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [77]: %timeit arctan_parts(123, 456, 10, 4)
266 μs ± 9.49 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [78]: %timeit arctan_parts(123, 456, 10, 5)
704 μs ± 8.64 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [79]: %timeit arctan_parts(123, 456, 10, 6)
1.88 ms ± 23.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [80]: %timeit arctan_parts(123, 456, 10, 2, 2)
49 μs ± 1.44 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [81]: %timeit arctan_parts(123, 456, 10, 3, 2)
97.1 μs ± 415 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [82]: %timeit arctan_parts(123, 456, 10, 4, 2)
195 μs ± 1.55 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [83]: %timeit arctan_parts(123, 456, 10, 5, 2)
395 μs ± 14.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [84]: %timeit arctan_parts(123, 456, 10, 6, 2)
774 μs ± 10.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [85]: %timeit arctan_parts(123, 456, 4, 5)
273 μs ± 11.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [86]: %timeit arctan_parts(123, 456, 4, 5, 2)
212 μs ± 9.68 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Previously arctan_parts(123, 456, 10, 5) 1.08 milliseconds (1080 microseconds) and arctan_parts(123, 456, 4, 5) took 303 microseconds, now the first test without truncation, only takes 704 microseconds, it is \$(1080-704)/1080=0.34\overline{814}\$ increase in performance, and with the optimization it is \$(1080-395)/1080=0.634\overline{259}\$ increase in performance, the second test takes 273 microseconds without truncation, \$(303-273)/303=0.099\overline{0099}\$ increase, with the final optimization now it is 212 microseconds, \$(303-212)/303=0.\overline{3003}\$ increase in performance.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.