Frame challenge: You describe needing to iterate through the rows in reverse order, while keeping the columns of each row in the same order, as input to a transformation algorithm.
Can You Access the Data in the Correct Order?
If you have control over the transformation algorithm, you can instead access the elements in a different order, such as:
// Width w and total size n are already computed, w <= INT_MAX, and n <= PTRDIFF_MAX.
for (ptrdiff_t i = n - w; i >= 0; i -= w) { // i must be SIGNED and wide enough!
    for (int j = 0; j < w; ++j) { // j should have the same signedness as i.
        const auto pixel_ij = *(buffer + i + j);
        /* ... */
You could instead slice the array into rows with std::span or std::ranges::subrange as your slice type, and pass those slices to algorithms that use the range interface.
An alternative using ranges and adapters is to reverse a chunk view, as in Toby Speight’s excellent answer, but instead of collecting the data to a vector, flatten the view.
In a more complicated convolution, you might separately calculate the addresses of two input rows, counting down here and up elsewhere, possibly as array slices or std::span, then iterate over the elements of the two rows in parallel.
Can You Permute in Place?
You can apply the algorithm for reversing an array in place to an array of rows.  That is, you can find the addresses of the rows starting from the top and counting down, and the addresses of the rows starting from the bottom and counting up, swap the two rows element-wise, and stop when the addresses meet in the middle.
Feedback on the Code as Written
Make The Permute Operation a Function
Not only is this better organization that putting everything in main, it lets you chain the returned vector, such as transform(permutePixelRows(getRawImage(camera))).
You might accept the input as a std::vector<double>&& that can be permuted in place, then return the permuted input vector (by move).  Alternatively, it could accept an input range and an output range, and return the output range.
Optimizing with Vector
First, you want to avoid allocating a std::vector off the heap within each iteration of the loop, as in:
    for(int i=0; i < h; ++i){
        vector<double> b;
This slows the loop down immensely and inhibits vectorization. Since each row has the same width, if you need a temporary buffer to hold a row, you can allocate it once before the loop, and overwrite it on each iteration.
Consider taking the output buffer as a parameter (possibly std::span<double>, which can alias any array-like contiguous data structure, or you could make it a template for any output range that supports std::begin(dest) and std::end(dest)).  This lets the caller fill any range of memory it owns.
If you stick with creating your own std::vector to hold the output data, it is faster than push_back to resize to the known final size, obtaining an initialized output buffer, and copy or move the input data to the correct offsets within the output buffer.
In 2025, inserting one element at a time at the end causes compilers to actually call the non-trivial push_back code each time, and copying to a pre-allocated output buffer does not.  If you must build your output as you go, at least copy a row of data at a time to minimize the number of bounds checks.  This is explicitly what Toby Speight’s excellent solution (which is very easy on the eyes) does.
However, you could also use the access pattern in the first code sample to fill an empty std::vector in order, which could give you simple code.
However you do, you probably want to call shrink_to_fit before you return, since you almost certainly will not be appending any more elements to the buffer.