Revisions to Implementing numerical integration

I meant g++, not gcc

Source Link

edited Jul 26, 2018 at 7:20

Toby Speight

88.3k
14
104
327

You can parallelize the loop in main(), like this (compile withadd gcc -fopenmp to your g++ command):

You can parallelize the loop in main(), like this (compile with gcc -fopenmp):

You can parallelize the loop in main(), like this (add -fopenmp to your g++ command):

edited body

Source Link

edited Jul 25, 2018 at 22:15

Toby Speight

88.3k
14
104
327

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonablyreasonable amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonably amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonable amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

Actually run with original du value, and report it

Source Link

edited Jul 25, 2018 at 22:00

Toby Speight

88.3k
14
104
327

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by writingre-writing cdfpdf() thus:

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        const float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, so I expect about 9000 secondsand 15m 6s with du = .0001, given K=3. You're still going to have severe scaling problems when K goes to 10 - you'll probably need to also apply some mathematical insight to reduce the complexity at this point).

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by writing cdf():

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, so I expect about 9000 seconds with du = .0001, given K=3. You're still going to have severe scaling problems when K goes to 10).

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by re-writing pdf() thus:

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        const float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, and 15m 6s with du = .0001. You're still going to have severe scaling problems when K goes to 10 - you'll probably need to also apply some mathematical insight to reduce the complexity at this point).

Add the compiler options used

Source Link

edited Jul 25, 2018 at 21:39

Toby Speight

88.3k
14
104
327

Loading

added 421 characters in body

Source Link

edited Jul 25, 2018 at 21:11

Toby Speight

88.3k
14
104
327

Loading

Source Link

answered Jul 25, 2018 at 20:42

Toby Speight

88.3k
14
104
327

Loading

Stack Exchange Network

Return to Answer