Skip to main content
I meant g++, not gcc
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327

You can parallelize the loop in main(), like this (compile withadd gcc -fopenmp to your g++ command):

You can parallelize the loop in main(), like this (compile with gcc -fopenmp):

You can parallelize the loop in main(), like this (add -fopenmp to your g++ command):

edited body
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonablyreasonable amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonably amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

I started with your code as a baseline, but increased du tenfold (to .001) to be able to test in a reasonable amount of time. I compiled with g++ -std=c++17 -O3 -march=native to allow the optimiser to use its full range of substitutions and to use the full capabilities of the machine.

Actually run with original du value, and report it
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by writingre-writing cdfpdf() thus:

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        const float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, so I expect about 9000 secondsand 15m 6s with du = .0001, given K=3. You're still going to have severe scaling problems when K goes to 10 - you'll probably need to also apply some mathematical insight to reduce the complexity at this point).

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by writing cdf():

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, so I expect about 9000 seconds with du = .0001, given K=3. You're still going to have severe scaling problems when K goes to 10).

std::pow() normally works by multiplying logarithms. For a power of 2, it's much simpler to simply multiply a number by itself. I got about 10% speed improvement by re-writing pdf() thus:

static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }

    float res = 0;
    const int iterations = h / du;

#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        const float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }

    return res;
}

That gained about 80% improvement over the original, on this 8-core machine. (9 seconds elapsed with du = .001, and 15m 6s with du = .0001. You're still going to have severe scaling problems when K goes to 10 - you'll probably need to also apply some mathematical insight to reduce the complexity at this point).

Add the compiler options used
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327
Loading
added 421 characters in body
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327
Loading
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327
Loading