std::pow() normally works by multiplying logarithms.  For a power of 2, it's much simpler to simply multiply a number by itself.  I got about 10% speed improvement by writingre-writing cdfpdf() thus:
static float integ(float h, int k, float du)
{
    if (k == 1) {
        return cdf(h);
    }
    float res = 0;
    const int iterations = h / du;
#pragma omp parallel for reduction(+:res)
    for (int i = 0;  i < iterations;  ++i) {
        const float u = i * du;
        res += integ(h - u, k - 1, du) * pdf(u) * du;
    }
    return res;
}
 That gained about 80% improvement over the original, on this 8-core machine.  (9 seconds elapsed with du = .001, so I expect about 9000 secondsand 15m 6s with du = .0001, given K=3.  You're still going to have severe scaling problems when K goes to 10 - you'll probably need to also apply some mathematical insight to reduce the complexity at this point).