Fast circular buffer [closed]

Question

Closed. This question is off-topic. It is not currently accepting answers.

Missing Review Context: Code Review requires concrete code from a project, with enough code and / or context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site.

Closed last year.

Improve this question

I'm working on a modern C++20 solution that does two things:

It accumulates a sliding window of input data points; the sliding window's size is fixed and known beforehand. Data has strict ordering and it should be kept.
Feeds the current sliding window into some C-like interface, which requires contiguous memory. There is a catch - the interface expects a pointer, starting index, and number of items to read.

Known to me options:

Use std::deque for keeping a sliding window, since it allows cheap pushes and pops at the ends. However, it's not contiguous and would require data copy to, let's say, std::vector.
std::vector + std::rotate. Rotate the current vector, overwrite the last element with the current data point, and feed it into the interface. Requires O(N) rotation for each input.
Allocate a memory buffer significantly larger than the window size, like 1024 buffer elements for a sliding window of 5-10 elements. Write inputs to this buffer until you reach the end, copy the current sliding window to the beginning of the buffer, and continue. The same as 2), but performs O(N) copy only when we reach the buffer's end.

Currently, I stick to the last one, similar to this simplified snippet:

#include <iostream>
#include <vector>

class SlidingWindowBuffer {
public:
    SlidingWindowBuffer(size_t bufferSize, size_t windowSize)
        : buffer(bufferSize), windowSize(windowSize), writeIndex(0) {}

    void addElement(int element) {
        buffer[writeIndex] = element;
        writeIndex = (writeIndex + 1) % buffer.size();

        if (writeIndex == 0) {
            // Perform O(N) copy when we reach the end of the buffer
            for (size_t i = 0; i < windowSize; ++i) {
                buffer[i] = buffer[buffer.size() - windowSize + i];
            }
            writeIndex = windowSize;
        }
    }

    void printBuffer() const {
        for (size_t i = 0; i < buffer.size(); ++i) {
            std::cout << buffer[i] << " ";
        }
        std::cout << std::endl;
    }

private:
    std::vector<int> buffer;
    size_t windowSize;
    size_t writeIndex;
};

int main() {
    size_t bufferSize = 1024;
    size_t windowSize = 10;
    SlidingWindowBuffer swb(bufferSize, windowSize);

    // Example usage
    for (int i = 0; i < 1050; ++i) {
        swb.addElement(i);
    }

    swb.printBuffer();

    return 0;
}

The snippet itself is only for illustration purposes so it could be incorrect for some corner cases, it's not used anywhere.

My question is the following - are there any other alternatives I'm missing? Are there any considerations when you choose between alternatives?

"Similar to this simplified snippet" isn't how we normally do things here on Code Review, because we review any and all aspects of the code - e.g. misspelling std::size_t, unnecessary flushing with std::endl, inability to printBuffer() to any stream other than std::cout, redundant return from main(). — Toby Speight
– Toby Speight, Commented Aug 23, 2024 at 6:55
It has obviously been a while since you have posted on Code Review, we differ from Stack Overflow in some important ways. 1) You have to post real code from a project you created (the point @TobySpeight is trying to make). 2) The more code you post the better the review you get. 3) the code has to be working as intended. Please read A guide to Code Review for Stack Overflow users. — pacmaninbw
– pacmaninbw ♦, Commented Sep 12, 2024 at 11:56

G. Sliepen · Accepted Answer · 2024-08-22 19:18:24Z

11

Always keep two up-to-date copies in the buffer

are there any other alternatives I'm missing?

Your third option is a good one, but the drawback is the \$O(N)\$ copy operation every so often. Especially if you need to ensure adding an element always uses a small and bounded amount of time to complete, that can be a problem. Instead just make the copy one element at a time:

void addElement(int element) {
    buffer[writeIndex] = element;
    buffer[(writeIndex + buffer.size() / 2) % buffer.size()] = element;
    writeIndex = (writeIndex + 1) % buffer.size();
}

Now there will always be a completely contiguous region of the latest buffer.size() / 2 elements.

Note that there is no windowSize here. It works most efficiently if you set bufferSize equal to 2 * windowSize, any more is just wasted space, but on the other hand:

Make sure the size is a power of two

The modulo operation is one of the slowest instructions the CPU can do. However, if the size is a power of two, you can use a bitwise AND operation instead.

answered Aug 22, 2024 at 19:18

G. Sliepen

69.3k3 gold badges75 silver badges180 bronze badges

3

\$\begingroup\$ You can support arbitrary buffer sizes and still get rid of the slow modulo operations with a simple if (++writeIndex >= buffer.size() / 2) writeIndex = 0; \$\endgroup\$

Ilmari Karonen
– Ilmari Karonen

2024-08-23 05:06:49 +00:00
Commented Aug 23, 2024 at 5:06
1

\$\begingroup\$ The compiler also can only optimize % buffer.size() to an efficient bitwise and if buffer.size() is a constant known at compile time, such as on a std::array. You could alternatively store the exponent of 2, so that & (((size_t)1 << exponent) - 1U) can be computed quickly. \$\endgroup\$

Davislor
– Davislor

2024-08-24 07:02:32 +00:00
Commented Aug 24, 2024 at 7:02

Add a comment |

Balor64 · Accepted Answer · 2024-08-23 22:03:04Z

Virtual Pages

You can use virtual pages to create 2 adjacent regions of virtual memory that are both mapped to the same region of physical memory. For an example of how to do this, see: https://lo.calho.st/posts/black-magic-buffer/.

typedef struct {
    uint8_t *buffer;
    size_t   buffer_size;
    int      fd;
    size_t   head;
    size_t   tail;
} queue_t;

bool init(queue_t *q, size_t size) {
    // Make an anonymous file and set its size
    q->fd = memfd_create("queue_buffer", 0);
    ftruncate(q->fd, size);

    // Ask mmap for an address at a location where we can put both virtual copies of the buffer
    q->buffer = mmap(NULL, 2 * size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    // Map the buffer at that address
    mmap(q->buffer, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, q->fd, 0);

    // Now map it again, in the next virtual page
    mmap(q->buffer + size, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, q->fd, 0);

    // Initialize our buffer indices
    q->head = 0;
    q->tail = 0;
}

You would manage this buffer in much the same way as in G. Sliepen's answer, except that you don't create that second copy of each element. Instead, each element will automatically be reachable from 2 different addresses.

Pros:

Every operation is O(1).
There is no copying overhead whatsoever.

Cons:

This cannot be done in a portable manner. If you need to be platform-independent, consider writing one virtual page-based implementation for each platform that supports that, and writing a different implementation as a fallback for other platforms to use.
The buffer size must be a multiple of the page size, so if you only need small buffers, it'll cost you lots of extra memory.

Perhaps consider using boost::interprocess::shared_memory_object for a semi-portable implementation. You'll need to pass the address parameter to the mapped_region constructor. — Toby Speight
– Toby Speight, Commented Aug 24, 2024 at 16:03

Matthieu M. · Accepted Answer · 2024-08-23 07:41:13Z

What O(N)?

You are correct that copying the bytes is O(N), however there are multiple Ns here, so this can be misleading. In this particular case, it should be noted that you can get away with never copying more than a window-size worth of data, and thus you can have a simple O(window-size) solution.

Vector is good enough!

When handling a similar problem -- TCP bytes, which have to be buffered until a full message is received -- my goto solution is to just use a std::vector (or similar):

Buffer all you can in the vector.
Unbuffer all complete windows.
Trim all complete windows.

For example:

Buffer 1018 elements (with push_back).
Unbuffer 1010 elements (window-size of 10).
Erase the first 1010 elements (with range erase, resulting in copying the last 8).

Even in the case of an unbounded number of buffered bytes, this process is efficient since it never copies more than a window-size worth of data.

std::memcpy is blindingly fast, as long as your elements can be memcpied (int can) and their number is small (a few cache lines), there's little point doing anything more complicated.

Toby Speight · Accepted Answer · 2024-08-26 19:12:41Z

G. Sliepen's answer shows how to get consistent insertion times, but I'll review some other aspects:

    writeIndex = (writeIndex + 1) % buffer.size();

    if (writeIndex == 0) {
        // Perform O(N) copy when we reach the end of the buffer
        for (size_t i = 0; i < windowSize; ++i) {
            buffer[i] = buffer[buffer.size() - windowSize + i];
        }
        writeIndex = windowSize;
    }

Here, we do the expensive % every time, but we only need to test whether we have reached the capacity:

    if (++writeIndex == buffer.size()) {
        // Perform O(N) copy when we reach the end of the buffer
        for (size_t i = 0; i < windowSize; ++i) {
            buffer[i] = buffer[buffer.size() - windowSize + i];
        }
        writeIndex = windowSize;
    }

The loop that shifts the content is just the std::copy() algorithm written out long-hand. Prefer to use standard algorithms when available, to keep the logic simple.

printBuffer() is quite inflexible - in particular, this debugging information is much more suited to std::clog than std::cout. Consider instead providing a function that returns a std::span that any caller can use:

std::span<int> buffer_content() const;

That simplifies the shift operation, too - it can be something like

        std::ranges::copy(buffer_content(), buffer.begin());

We'll probably want to generalise this to types other than int - at which point, we should think about whether the content-type will be default- and copy-constructible.

If it's not default-constructible, then we can't initialise the content as buffer{bufferSize} - we'll need to initialise it as an empty vector, reserve its capacity, then emplace_back() in the add function.

If it's not copy-constructible, then we'll need to std::move() rather than std::copy() when shifting.

And we'll need to erase() or resize() to destruct the moved-from elements after copying content to the beginning. Or just erase() the earlier content and let the vector deal with moving - as always, the most maintainable code is that which you don't have to write!

Give the member functions names that conform to standard container members, so that the class is more usable in generic code. E.g. push_back() is a better choice than addElement().

Modified code

#include <stdexcept>
#include <span>
#include <vector>

template<class T>
class SlidingWindowBuffer
{
    std::vector<T> buffer = {};
    std::size_t windowSize;

public:
    SlidingWindowBuffer(size_t bufferSize, size_t windowSize)
        : windowSize{windowSize}
    {
        if (windowSize > bufferSize) {
            throw std::invalid_argument("Window size exceeds storage capacity");
        }
        buffer.reserve(bufferSize);
    }

    auto begin() const {
        return (buffer.size() < windowSize)
            ? buffer.begin()
            : buffer.end() - windowSize;
    }
    auto end() const {
        return buffer.end();
    }

    void push_back(T element)
    {
        if (buffer.size() == buffer.capacity()) {
            buffer.erase(buffer.begin(), begin());
        }

        buffer.push_back(std::move(element));
    }

    std::span<const T> content() const
    {
        return {begin(), end()};
    }

    // for debugging use
    std::span<const T> buffer_content() const
    {
        return buffer;
    }
};

// Demo

#include <algorithm>
#include <iostream>
#include <iterator>

int main()
{
    std::size_t bufferSize = 15;
    std::size_t windowSize = 10;
    SlidingWindowBuffer<int> swb(bufferSize, windowSize);

    // Example usage
    for (int i = 0; i < 20; ++i) {
        swb.push_back(i);
        std::ranges::copy(swb, std::ostream_iterator<int>(std::cout, " "));
        std::cout << '\n';
    }

    std::cout << '\n';
    std::ranges::copy(swb.buffer_content(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << '\n';
}

Stack Exchange Network

Fast circular buffer [closed]

4 Answers 4

Always keep two up-to-date copies in the buffer

Make sure the size is a power of two

Virtual Pages

What O(N)?

Vector is good enough!

Modified code

Linked

Hot Network Questions

Fast circular buffer [closed]

4 Answers 4

Always keep two up-to-date copies in the buffer

Make sure the size is a power of two

Virtual Pages

What O(N)?

Vector is good enough!

Modified code

Linked

Related

Hot Network Questions