Madhav Goyal

Posted on May 18

Dealing with OpenGL context and multiple threads

#cpp #gamedev #graphicsprogramming

Hi fellow devs!
I have been working on an OpenGL based desktop application for the past few months, and I obviously got stuck on the same problem everyone has to deal with, how to manage OpenGL context while making use of multiple threads.

Understanding OpenGL: The Basics

It would be better to explain some of OpenGL's concepts before we start writing code for multi-threaded systems.

OpenGL is a state machine

Any OpenGL function that you call, always affects the currently active context. Let me explain this with an example code block.

// vertex coords
float vertices[] = {...};

// Creating a vertex buffer
unsigned int vao, vbo;
glGenVertexArrays(1, &vao);
glGenBuffers(1, &vbo);

glBindVertexArray(vao);
glBindBuffer(GL_ARRAY_BUFFER, vbo);

// Copy vertices from main memory to GPU memory
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
...

Now, in this example, the glBufferData does not take any input, that specifies which buffer it is operating on. OpenGL just executes this function on the buffer that was last bound with glBindBuffer call.

This is really important because it means that the state of OpenGL is context dependent. The target of next command is dependent on the previous command, multi-threading usually can't be implemented in such systems.

So, how does multi-threading work

For all threads (render and worker), assume the previous state is destroyed and reset it by binding the required variables before performing any OpenGL operation.

For eg.

// Render thread
while (!glfwWindowShouldClose(window)) {
        glClear(GL_COLOR_BUFFER_BIT);

        // Bind vertices, textures, shaders
        glUseProgram(...);
        glBindTexture(...);
        glUseProgram(...);
        glBindVertexArray(...);

        // Draw call
        glDrawElements(...);
}

// Image loader (Worker) thread

// Creating texture
unsigned int texture;
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexParameteri(...);
glTexParameteri(...);

// Load image data
unsigned char* imageData = stbi_load(...);

// Pass data to GPU
glTexImage2D(GL_TEXTURE_2D, ...);

delete[] imageData;

This example would've almost worked, but since both threads run in parallel, one glBind<Anything> call could be overridden by another and lead to undefined behaviour.

You might look at this and think, this seems like the typical critical section problem and can be easily solved using a mutex. So, you add one and now it looks something like this.

#incude <mutex>

std::mutex GLMutex;

// Render thread

while (!glfwWindowShouldClose(window)) {
+       // Lock the GLMutex before drawing
+       GLMutex.lock();

        glClear(GL_COLOR_BUFFER_BIT);

        // Bind vertices, textures, shaders
        glUseProgram(...);
        glBindTexture(...);
        glUseProgram(...);
        glBindVertexArray(...);

        // Draw call
        glDrawElements(...);

+       // Release the GLMutex
+       GLMutex.unlock();
}

// Image loader (Worker) thread

+ // Lock GLMutex
+ GLMutex.lock();

// Creating texture
unsigned int texture;
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexParameteri(...);
glTexParameteri(...);

// Load image data
unsigned char* imageData = stbi_load(...);

// Pass data to GPU
glTexImage2D(GL_TEXTURE_2D, ...);

+ // Release GLMutex
+ GLMutex.unlock();

delete[] imageData;

Now, OpenGL should be able to work parallely across multiple threads. But, still... IT DOESN'T WORK.

OpenGL context

The reason it doesn't work is because OpenGL commands are dependent on a valid OpenGL context active in the current thread. The glfwCreateWindow command which you would've used somewhere in the beginning creates an OpenGL rendering context and that context is bound to the thread that calls it. There can be any number of OpenGL contexts in a program, but they aren't shared among threads (Doesn't mean you can't share them yourself).

In the last example, any OpenGL call made from the main thread was valid and working properly, but none of the OpenGL functions called from the worker thread were executed because it did not have a rendering context.

Now, since rendering context is also just a variable (GLFWwindow) you can also share it across threads using a mutex. And, it should just work when used like this

#incude <mutex>

std::mutex GLMutex;

// Render thread

while (!glfwWindowShouldClose(window)) {
        // Lock the GLMutex before drawing
        GLMutex.lock();
+       glfwMakeContextCurrent(window);

        glClear(GL_COLOR_BUFFER_BIT);

        // Bind vertices, textures, shaders
        glUseProgram(...);
        glBindTexture(...);
        glUseProgram(...);
        glBindVertexArray(...);

        // Draw call
        glDrawElements(...);

        // Release the GLMutex
+       glfwMakeContextCurrent(NULL);
        GLMutex.unlock();
}

// Image loader (Worker) thread

// Lock GLMutex
GLMutex.lock();
+ glfwMakeContextCurrent(window);

// Creating texture
unsigned int texture;
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexParameteri(...);
glTexParameteri(...);

// Load image data
unsigned char* imageData = stbi_load(...);

// Pass data to GPU
glTexImage2D(GL_TEXTURE_2D, ...);

// Release GLMutex
+glfwMakeContextCurrent(NULL);
GLMutex.unlock();

delete[] imageData;

This implementation is valid and should yield the desired results. However, sharing OpenGL contexts isn't a recommended approach since, they aren't inherently thread-safe and any improper management can lead to race conditions, undefined behavior, or crashes.
You should always avoid sharing context across threads unless absolutely necessary (which it isn't 😉).

The solution I've come up with

Disclaimer: Although I did come up with this solution on my own, I do not claim that no one else has ever done it before. (Couldn't find it anywhere I looked)

The key ideas

Instead of passing the context around, this method keeps the context on the main thread and passes the commands to it.
This method is assumes that the main thread maintains a high level of responsiveness (fast thread) and on the other hand, worker threads, which can be dedicated to processing CPU-intensive or I/O-bound operations, are allowed to take their time (slow thread). These threads can even wait for the main thread to complete its short, critical tasks without causing performance hiccups or delays in the overall system responsiveness.

Requirements

There can be more than one places where a thread might want to call OpenGL functions so, their order of execution should be maintained when called from the main thread.

Eg.

// Worker
void ImageLoader() {
    /*
    ----------------------------------------
    OpenGL texture creation commands
    ----------------------------------------
    */

    // Load image data
    unsigned char* imageData = stbi_load(...);

    /*
    ----------------------------------------
    Setting image data on texture
    ----------------------------------------
    */

    delete[] imageData;

    ... Some other code ...
}

Reference or pointer variables should still be valid at the time of execution. Otherwise, they should be passed by value.
It shouldn't require a very specific setup for every task.

Implementation

We first create an object that describes each OpenGL job task.

GLJob.h

#include <functional>
#include <mutex>

class GLJob {
  public:
    GLJob(std::function<void()> func,
          std::mutex *end = nullptr);

    void execute();
    void reject();

  private:
    std::mutex *endMutex;
    std::function<void()> jobFunc;
};

GLJob.cpp

#include "GLJob.h"

GLJob::GLJob(std::function<void()> func, std::mutex *end)
    : endMutex(end), jobFunc(func) {}

void GLJob::execute() {
    jobFunc();
    if (endMutex)
        endMutex->unlock();
}

void GLJob::reject() {
    if (endMutex)
        endMutex->unlock();
}

And a thread safe queue.

GLJobQ.h

#include <mutex>
#include <queue>
#include <memory>

class GLJobQ {
  public:
    void push(std::shared_ptr<GLJob> job);
    bool empty();
    std::shared_ptr<GLJob> pop();

  private:
    std::mutex qMutex;
    std::queue<std::shared_ptr<GLJob>> jobQ;
};

GLJobQ.cpp

#include "GLJobQ.h"

void GLJobQ::push(std::shared_ptr<GLJob> job) {
    std::lock_guard<std::mutex> lock(qMutex);
    // bool shouldRun = Some function to check if job should be rejected or not
    if (!shouldRun) {
        job->reject();
    } else {
        jobQ.push(job);
    }
}

bool GLJobQ::empty() {
    std::lock_guard<std::mutex> lock(qMutex);
    return jobQ.empty();
}

std::shared_ptr<GLJob> GLJobQ::pop() {
    std::lock_guard<std::mutex> lock(qMutex);
    if (jobQ.empty())
        return nullptr;
    auto job = jobQ.front();
    jobQ.pop();
    return job;
}

And that's it :)

Usage

Make a GLJobQ object in the main thread and make sure it's globally accessible.
From every other thread, just create a GLJob and push it to the GLJobQ.

Example

Let's assume we have an instance of Application object running on main thread and an ImageLoader function which runs on a separate thread

Application.h

#include "GLJobQ.h"

class Application {
    // Other members...

    // Application public members are globally accessible
  public:
    GLJobQ queue;

    void start();
}

Application.cpp

void start() {
    while (!glfwWindowShouldClose(window)) {
        // Frame rendering...

        // Execute 1 GLJob per frame
        if (!queue.empty()) {
            queue.pop()->execute();
        }
    }
}

ImageLoader.cpp

void ImageLoader() {
    // Lock to keep context active until all jobs are finished
    std::mutex finish;
    finish.lock();

    Application &app = // Get application ref

    GLuint tex;
    // Push a texture creation job
    std::shared_ptr<GLJob> textureJob = std::make_shared<GLJob>([&tex]() {
        glGenTextures(1, &tex);
        glBindTexture(GL_TEXTURE_2D, tex);

        glTexParameteri(...);
    });
    app.glJobQ.push(textureJob);

    unsigned char *imageData = stbi_load(...);

    // Push a texture assignment job
    std::shared_ptr<GLJob> textureAssignJob = std::make_shared<GLJob>([&tex, imageData]() {
        GLuint tex;
        glBindTexture(GL_TEXTURE_2D, tex);
        glTexImage2D(... data);

        delete[] data;
    },
    // Pass the mutex since this is last GLJob in this thread
    &finish);
    app.glJobQ.push(textureAssignJob);

    // Wait till all GLJobs are finished
    finish.lock();

    // Delete imageData only when all dependent GLJobs are finished
    delete[] imageData;
}

It satisfies our Requirements

Our first requirement was to have a system that runs separate blocks of commands sequentially from the main thread. To satisfy that, we've created a queue of commands.
The worker threads might finish their execution before their jobs are finished. In such case, variables passed to GLJob as reference will be freed before access. To mitigate this problem, GLJob takes an optional variable of the finish mutex. This can be used to enforce the worker threads to wait for it's GLJobs to finish. Since the worker threads are Slow threads, they can afford to wait for main thread without affecting performance of the application.
The GLJob object takes a lambda function with std::function<void()> signature. This way, it doesn't require any specific function or storage variable for different kinds of OpenGL tasks and every task handles itself.

Why it works? - OpenGL doesn't care which thread is making calls to OpenGL functions as long as that thread has the OpenGL context bound.

DEV Community