Efficient Implementation of functional and Lazy evaluation in C++

Question

I am creating a c++ library implementing Java Functional Programming alike interface. In short, the code will look like this:

vector<string> buffer = ... ; // A buffer contains some strings
new IntStream(0, 100).map([](int a){
    return (a * 0x2344DDEF) & 0xF;
}).map([=](int a) {
    return buffer[a];
}).foreach([](string a) {
    cout << a << '\n';
});

Now I want to support parallel evaluation. For the example above, I want to get 100 execution tasks and send them to a thread pool. To do this, I have created EvalOp classes. The stream returns a list of EvalOp objects. They will only perform the actual computation when you invoke EvalOp::eval

template <typename T>
class EvalOp {
public:
    virtual T eval() = 0;
}; 

template <typename FROM, typename TO>
class TransformOp : public EvalOp<TO> {
public:
    TO eval() override {
        return mapper_(previous_->eval());
    }
protected:
    unique_ptr<EvalOp<FROM>> previous_;
    function<TO(FROM&)> mapper_;
};

template <typename T>
class Stream {
public:
    virtual bool isEmpty() = 0;
    virtual EvalOp<T> next() = 0;
    Stream<N> map(function<N(T)> mapper) {
        return new MapStream<N,T>(this, mapper);
    }
}

template <typename FROM, typename TO>
class MapStream : public Stream<TO> {
protected:
    Stream<FROM>* previous_;
    function<TO(FROM)> mapper_;
public:
    EvalOp<TO> next() override {
        return new TransformOp<FROM, TO>(previous_->next(), mapper_);
    }
}

My stream will now return a bunch of EvalOp objects, which you can throw in a thread pool.

This code gets me the correct result. But as it creates many wrapper classes (the EvalOps), the execution is slower. I did a benchmark of the following two tasks:

uint32_t __attribute__ ((noinline)) hash1(uint32_t input) {
    return input * 0x12345768;
}

uint32_t __attribute__ ((noinline)) hash2(uint32_t input) {
    return ((input * 0x2FDF1234) << 12) * 0x23429459;
}

uint32_t sum = 0;

void summer(uint32_t input) {
    sum += input;
}

BENCHMARK(StreamBenchmark, Serial)(State& state) {
    for(auto _:state) {
        for(int i = 0 ; i < 10000; ++i)
            sum += hash2(hash1(i));
        }
    }
}

BENCHMARK(StreamBenchmark, Wrapper)(State& state) {
    for(auto _:state) {
        IntStream stream(0, 10000);
        stream.map(hash1).map(hash2).foreach(summer);
    }
}

From the benchmark result, I see for each element, only 1ns is spent on actual computation and 40ns overhead is spent on the Stream and EvalOp. I am looking for some suggestions to make a more efficient design. Thank you very much!

I want to know how you do timings with 1ns resolution? I need that. — Loki Astari
– Loki Astari, Commented May 8, 2020 at 20:41
I use google benchmark to run the benchmark and it reports the Serial method consumes ~14000ns CPU time. — Harper
– Harper, Commented May 8, 2020 at 20:44

Quuxplusone · Accepted Answer · 2020-05-09 06:24:43Z

I think this question may be off-topic, as it does not consist of complete working code to be reviewed. I spent (too long) trying to get it to compile, and got as far as this: https://godbolt.org/z/qCE9S8

But you have a lot of problems with the current design. Most notably, you're using raw new all over the place, which creates pointers to the heap; but you aren't actually using the correct syntax to refer to those pointers. For example:

Stream<N> map(function<N(T)> mapper) {
    return new MapStream<N,T>(this, mapper);
}

Here new MapStream<...>() yields a value of type MapStream<N,T>*, but you're trying to return it as if it were a Stream<N> object. This flatly will not compile.

You could consider changing this return type to std::unique_ptr<Stream<N>> (and using make_unique), but that still won't really work for your use-case, because then you'll have to change this line:

stream.map(hash1).map(hash2).foreach(summer);

to:

stream.map(hash1)->map(hash2)->foreach(summer);

because now map returns a pointer. And worse, there's no way for the MapStream object itself to transfer its own ownership into the previous_ member of the next MapStream object in the chain. You end up with a bunch of temporary unique_ptrs, all linked together by raw pointers which will dangle as soon as the current full-expression finishes. That's okay for your benchmark, but it won't work at all in practice.

You might consider looking at a type erasure design, so that you could keep using Stream<int> as a value type (not a polymorphic base class, no visible pointers) but give it behavior that appeared polymorphic at run time.

Thank you very much for your suggestion! You are right that this is not a complete working code but rather to show the concept. The problem you pointed out (not using unique pointers and returning pointers when requiring an object) are fixed in my working code. And I am looking at the type erasure design you mentioned, having a feeling this may be the way to solve my problem. I will accept your answer if it indeed is. — Harper
– Harper, Commented May 9, 2020 at 8:05

Stack Exchange Network

Efficient Implementation of functional and Lazy evaluation in C++

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Efficient Implementation of functional and Lazy evaluation in C++

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions