Unlocking True Parallelism: A Guide to Multi-Core C++ with `qb`

#cpp #distributedsystems #opensource #github

Modern CPUs have multiple cores, but writing correct, scalable parallel code is hard. Learn how the qb framework makes it trivial to distribute work across all available CPU power, turning concurrency into true parallelism.

Target Audience: Intermediate C++ developers looking to improve the performance and scalability of their applications.

GitHub: https://github.com/isndev/qb

Your C++ application might be concurrent, but is it truly parallel? Concurrency is about managing multiple tasks at once, but parallelism is about executing multiple tasks at the same time. On a multi-core CPU, this is the key to unlocking maximum performance.

The qb actor framework is designed specifically for this. It abstracts away the complexities of thread management, affinity, and inter-thread communication, allowing you to focus on your application logic while the framework handles the parallel execution.

How `qb` Manages Cores

The qb::Main engine maintains a pool of worker threads, each pinned to a specific CPU core. This is called a VirtualCore. When you create an actor, you simply tell the engine which core it should live on.

+-----------------------+      +-----------------------+      +-----------------------+
|        Core 0         |      |        Core 1         |      |        Core 2         |
|                       |      |                       |      |                       |
|  +-----------------+  |      |  +-----------------+  |      |  +-----------------+  |
|  | DispatcherActor |  |      |  |   WorkerActor A |  |      |  |   WorkerActor B |  |
|  +-----------------+  |      |  +-----------------+  |      |  +-----------------+  |
|                       |      |                       |      |                       |
+-----------|-----------+      +-----------^-----------+      +-----------^-----------+
            |                              |                          |
            |     push<WorkEvent>(...)     |                          |
            +------------------------------+--------------------------+

          Lock-Free, Inter-Core Message Queues (Transparent to Developer)

Each VirtualCore runs its own event loop, processing the mailboxes of its assigned actors. When an actor on Core 0 sends a message to an actor on Core 1, qb transparently handles the cross-core communication using high-performance, lock-free queues. You, the developer, just call push<Event>(...)—the framework does the rest.

Example: A Multi-Core Dispatcher System

Let's look at a practical example. We'll create a DispatcherActor on one core that sends work to multiple WorkerActors distributed across other available cores. This is a common pattern for load balancing and parallel processing.

From examples/core/example3_multicore.cpp

#include <qb/actor.h>
#include <qb/main.h>
#include <qb/io.h>

// Event to represent a piece of work
struct WorkEvent : public qb::Event {
    int value;
    explicit WorkEvent(int val) : value(val) {}
};

// The worker actor, designed to run on any core
class WorkerActor : public qb::Actor {
public:
    WorkerActor() {
        registerEvent<WorkEvent>(*this);
    }

    bool onInit() override {
        // Announce which core this worker is running on
        qb::io::cout() << "Worker " << id() << ": Online on core " << getIndex() << std::endl;
        return true;
    }

    void on(WorkEvent& event) {
        qb::io::cout() << "Worker " << id() << " (on core " << getIndex() 
                       << "): Processing work item " << event.value << std::endl;
        // Simulate work
        std::this_thread::sleep_for(std::chrono::milliseconds(50));
    }
};

// The dispatcher actor, which distributes work
class DispatcherActor : public qb::Actor, public qb::ICallback {
private:
    std::vector<qb::ActorId> _workers;
    int _work_item_counter = 0;

public:
    explicit DispatcherActor(std::vector<qb::ActorId> workers) : _workers(std::move(workers)) {}

    bool onInit() override {
        qb::io::cout() << "Dispatcher on core " << getIndex() 
                       << ": Managing " << _workers.size() << " workers." << std::endl;
        registerCallback(*this); // Use a periodic callback to send work
        return true;
    }

    void onCallback() override {
        if (_work_item_counter >= 20) {
            // After sending 20 items, broadcast a shutdown and terminate
            qb::io::cout() << "Dispatcher: All work sent. Broadcasting shutdown." << std::endl;
            broadcast<qb::KillEvent>(); // a system-wide shutdown event
            return;
        }

        // Distribute work in a round-robin fashion
        qb::ActorId target_worker = _workers[_work_item_counter % _workers.size()];
        push<WorkEvent>(target_worker, _work_item_counter++);
    }
};

int main() {
    // Use up to 4 hardware cores for this demo
    const unsigned int cores_to_use = std::min(4u, std::thread::hardware_concurrency());
    qb::io::cout() << "Main: Using " << cores_to_use << " cores for the demo." << std::endl;

    qb::Main engine;

    // 1. Create worker actors and distribute them across cores
    std::vector<qb::ActorId> worker_ids;
    for (unsigned int i = 0; i < (cores_to_use - 1) * 2; ++i) { // 2 workers per core
        // Assign workers to cores 1, 2, 3...
        qb::CoreId core_id = 1 + (i % (cores_to_use - 1));
        worker_ids.push_back(engine.addActor<WorkerActor>(core_id));
    }

    // 2. Create the dispatcher on a dedicated core (core 0)
    engine.addActor<DispatcherActor>(0, worker_ids);

    // 3. Start the engine and watch the parallel execution
    engine.start();
    engine.join(); // Wait for all actors to finish

    qb::io::cout() << "Main: All actors have terminated." << std::endl;
    return 0;
}

When you run this code, you'll see output from workers on different cores processing work items simultaneously. The dispatcher on Core 0 seamlessly sends messages to workers on Cores 1, 2, and 3 without any manual thread management.

Key Benefits of `qb`'s Multi-Core Approach

Effortless Parallelism: You write standard actor logic; the framework handles the parallel execution. Adding an actor to a different core is a one-line change: engine.addActor<MyActor>(core_id).
No Data Races: Because actors don't share state, you are protected from data races between actors on different cores.
Improved Cache Performance: By pinning actors to specific cores (setAffinity), you can ensure that an actor's data stays hot in its CPU core's L1/L2 cache, which is critical for low-latency applications.
Predictable Performance: Isolating critical, low-latency actors (like a matching engine) on a dedicated core prevents them from being impacted by "noisy neighbors" (other less critical tasks).

The qb framework provides the tools to easily and safely build truly parallel C++ applications, allowing you to fully harness the power of modern multi-core processors.

Explore the qb framework and its modules: