Dividing shared resources for homogeneous multithread processing

Question

I'm trying to implement a homogeneous multithreading example that multiple threads process portion of a huge task. In order to achieve this, I thought of clustering data/resource and multiple threads handle/process each clustered data/resource. In the meantime, I wanted to maintain singleton like structure that can manage all clustered data from outside as if they are not clustered.

Code example of design is as follows,

Sorry about using interlocked family of functions if it bothers you.

struct SResource
{
    unsigned char ucA;
    unsigned long long ullB;
    unsigned char pBuffer[1024];
};

class CResourceManager
{
private:

    static CResourceManager s_pManagers[16];

    long long m_llLock;

    std::map<unsigned long long, SResource*> m_mapResources;

    CResourceManager(){}
    ~CResourceManager(){}

public:

    enum class EResult : unsigned long long
    {
        None = 0ULL,
        Success = 1ULL,
        Fail_ReachedMaximumTrial = 2ULL,
        Fail_ArgumentNull = 3ULL,
        Fail_ArgumentInvalid = 4ULL
    };

    static const EResult Do(const unsigned long long _ullIndex, const unsigned long long _ullTrialCount = 65535ULL)
    {
        if (_ullIndex < 0LL)
        {
            return EResult::Fail_ArgumentInvalid;
        }

        if (_ullIndex >= 16LL)
        {
            return EResult::Fail_ArgumentInvalid;
        }

        for (unsigned long long i = 0ULL; i < _ullTrialCount; ++i)
        {
            if (InterlockedCompareExchange64(&s_pManagers[_ullIndex].m_llLock, 1LL, 0LL) == 0LL)
            {
                std::map<unsigned long long, SResource*>::iterator iterEnd = s_pManagers[_ullIndex].m_mapResources.end();
                for (std::map<unsigned long long, SResource*>::iterator iter = s_pManagers[_ullIndex].m_mapResources.begin(); iter != iterEnd; ++iter)
                {
                    if (iter->second != nullptr)
                    {
                        // Do something with each resource
                    }
                }

                InterlockedExchange64(&s_pManagers[_ullIndex].m_llLock, 0LL);

                return EResult::Success;
            }
        }

        return EResult::Fail_ReachedMaximumTrial;
    }

    static const EResult GetClusterIndex(unsigned long long _ullResourceID, unsigned long long* _pIndex, const unsigned long long _ullTrialCount = 65535ULL)
    {
        if (_pIndex == nullptr)
        {
            return EResult::Fail_ArgumentNull;
        }

        for (unsigned long long i = 0ULL; i < 16; ++i)
        {
            for (unsigned long long j = 0ULL; j < _ullTrialCount; ++j)
            {
                if (_InterlockedCompareExchange64(&s_pManagers[i].m_llLock, 1LL, 0LL) == 0LL)
                {
                    std::map<unsigned long long, SResource*>::iterator iter = s_pManagers[i].m_mapResources.find(_ullResourceID);
                    if (iter != s_pManagers[i].m_mapResources.end())
                    {
                        (*_pIndex) = i;
                        _InterlockedExchange64(&s_pManagers[i].m_llLock, 0LL);
                        return EResult::Success;
                    }

                    _InterlockedExchange64(&s_pManagers[i].m_llLock, 0LL);
                    break;
                }
            }
        }

        return EResult::Fail_ReachedMaximumTrial;
    }
}

The Do function pointer will be inserted to my threadpool directly or by wrapping it by lambda.

I wonder if my concept of clustering data per thread is bad idea?

Also, expression like s_pManagers[_ullIndex].m_llLock is safe?

My first design was readers-writer-lock for s_pManagers, but tiny lock bothered me so much.

G. Sliepen · Accepted Answer · 2022-01-01 12:45:18Z

Note that we only review concrete, working code on Code Review, we don't review stub code, hypothetical code or pseudo-code. It is hard to say whether your code is good or not if we can't see how it will actually be used. However, there is enough code in your question that can be commented on.

Answers to your questions

I wonder if my concept of clustering data per thread is bad idea?

No, it's usually a very good idea, especially if the threads don't need to communicate with each other. Ideally, you have one thread per CPU core. If you can just simply divide the data into equal portions and each thread processes that amount in the same time as other threads, it is fine. However, if the time it takes to process a portion of data varies a lot, you might end up with some threads that have finished earlier than others. This means you have to wait for the slowest thread. In that case, it might make more sense to implement a queue of work and several worker threads that try to pick work from the queue in a loop. It might still be a good idea to pick work in (small) batches in that case.

Also, expression like s_pManagers[_ullIndex].m_llLock is safe?

Since s_pManagers is a static array, yes.

Use `std::mutex`

Sorry about using interlocked family of functions if it bothers you.

It is indeed bothersome, because it is Windows-specific, making your code non-portable. I recommend you use std::mutex instead, and use std::lock_guard to lock the mutex. Apart from needing less code, this also protects you if the work you are doing throws an exception, because it will then cause the lock to be automatically released, whereas your code does not have any protection against that.

Overuse of `long long`

Why is almost everything a long long? The variable m_llLock doesn't need to be 64-bits for a simple lock. Why is the underlying type of EResult a long long when there are only 5 possible values? Some other uses of long long in your code are also questionable.

Don't hardcode the number of threads

It looks like you wrote your code to have exactly 16 threads running? What if you run it on a laptop with only 4 cores? Or a dual-Epyc workstation with 128 cores? Don't hardcode numbers like this. Either make it configurable at runtime somehow, or get the number of hardware threads supported by the system programmatically using std::thread::hardware_concurrency().

Use range-for loops

Since C++11, range-for loops greatly simplify iterating over the elements of a container like std::map. I strongly recommend you use this. For example, the inner for-loop in Do() can be replaced with:

for (auto resource: s_pManagers[_ullIndex].m_mapResources)
{
    if (resource != nullptr)
    {
        // Do something with resource
    }
}

Remove unnecessary constructors/destructors

If you only have constructors and destructors that do nothing, remove them. See the rule of zero.

Don't return `const` values

If a function returns something by value, then adding const to the return type does not do anything. This only makes sense if the return type is a pointer or a reference.

Unnecessary checks if an unsigned value is negative

A check like if (_ullIndex < 0LL) is pointless, since an unsigned value can never be negative.

I'm sorry about late feedback and thank you for the answer G. Sliepen. — YoonSeok OH
– YoonSeok OH, Commented Jan 30, 2022 at 11:36

Stack Exchange Network

Dividing shared resources for homogeneous multithread processing

1 Answer 1

Answers to your questions

Use `std::mutex`

Overuse of `long long`

Don't hardcode the number of threads

Use range-for loops

Remove unnecessary constructors/destructors

Don't return `const` values

Unnecessary checks if an unsigned value is negative

You must log in to answer this question.

Hot Network Questions

Dividing shared resources for homogeneous multithread processing

1 Answer 1

Answers to your questions

Use std::mutex

Overuse of long long

Don't hardcode the number of threads

Use range-for loops

Remove unnecessary constructors/destructors

Don't return const values

Unnecessary checks if an unsigned value is negative

You must log in to answer this question.

Related

Hot Network Questions

Use `std::mutex`

Overuse of `long long`

Don't return `const` values