7
\$\begingroup\$

The goal is to brute force the length of the shortest unrestricted path that touches any part of each square within an n by n square grid.

Unrestricted path meaning any continuous curve between two points. We know from this post that only 1,1 and 1,0 moves on the vertex grid are relevant. Such line segments are all considered to be of length one as it is an equivalent problem in our case.

Optimal paths up to n=7 are proven to only contain moves that are zero waste (diagonal touches 3, straight touches 2 new squares every time), I enforce this observation as a rule for larger n too for much needed performance. All of the path is contained within the inner squares. The pathfinding starts in a triangle shaped area 1/8th of the inner square - 1.

It is clear from results that there is a pattern every mod 3 (up to n=9 is proven, I made the rest manually), but I don't see how enforcing a loose pattern while pathfinding would lead to a performance increase while brute forcing.

The code does what it's supposed to up to n=15 where we start getting random runtime failures. The main concern though is performance as even at n=10 it didn't finish after I ran it for ~10 hours.

Areas of suspicion: vertex_masks, multithreading implementation, better heuristic?

// g++ -std=c++2b -Ofast -march=native -flto path.cpp -o path
/*
3   0
6   0
9   1
10  4
15  5
15 10
16 15
21 18?
24 23?
25 32?
*/

#include <iostream>
#include <vector>
#include <cmath>
#include <cstdint>
#include <limits>
#include <thread>
#include <atomic>
#include <mutex>
#include <algorithm>
#include <array>
#include <bitset>

constexpr int n = 8;
constexpr int MAX_LEN = 40;

constexpr int total_bits = n * n;
// directions, straights first
constexpr int dx[8] = {1, 0, -1, 0, 1, -1, 1, -1};
constexpr int dy[8] = {0, 1, 0, -1, 1, 1, -1, -1};

using MaskType = std::array<std::array<std::bitset<total_bits>, n+1>, n+1>;
// matrix of masks of squares that a specific vertex touches
constexpr MaskType create_vertex_masks() {
    MaskType masks{};
    for (int x = 0; x <= n; ++x) {
        for (int y = 0; y <= n; ++y) {
            std::bitset<total_bits> mask;
            if (x > 0 && y > 0) mask.set((x-1) * n + (y-1));
            if (x > 0 && y < n) mask.set((x-1) * n + y);
            if (x < n && y > 0) mask.set(x * n + (y-1));
            if (x < n && y < n) mask.set(x * n + y);
            masks[x][y] = mask;
        }
    }
    return masks;
}
constexpr auto vertex_masks = create_vertex_masks();

constexpr std::array<std::array<bool, n+1>, n+1> create_is_outer() {
    std::array<std::array<bool, n+1>, n+1> is_outer = {};
    for (int j = 0; j <= n; ++j) {
        is_outer[0][j] = true;
        is_outer[n][j] = true;
    }
    for (int i = 1; i < n; ++i) {
        is_outer[i][0] = true;
        is_outer[i][n] = true;
    }
    return is_outer;
}
constexpr auto is_outer = create_is_outer();

std::atomic<int> global_best(std::numeric_limits<int>::max());
std::vector<std::pair<int, int>> global_best_path;
std::mutex global_mutex;

inline void dfs(int x, int y, std::bitset<total_bits> mask, int len, 
                std::vector<std::pair<int, int>>& path, int& local_best,
                std::vector<std::pair<int, int>>& local_best_path) {
    int effective_best = std::min(local_best, global_best.load(std::memory_order_relaxed));
    // ceil div 3 of remaining unvisited squares heuristic
    if (__builtin_expect(len + (total_bits - mask.count() + 2) / 3 >= effective_best, 0)) return;

    if (__builtin_expect(mask.all(), 0)) {
        if (__builtin_expect(len < local_best, 0)) {
            local_best = len;
            local_best_path = path;
            if (len <= global_best.load(std::memory_order_relaxed)) {
                std::lock_guard<std::mutex> lock(global_mutex);
                if (len < global_best) {
                    global_best = len;
                    global_best_path = path;
                    std::cout << "New best: " << len << ", Path:";
                    for (const auto& p : path)
                        std::cout << " (" << p.first << "," << p.second << ")";
                    std::cout << "\n";
                }
            }
        }
        return;
    }

    const auto invmask = ~mask;
    for (int dir = 0; dir < 8; ++dir) {
        const int nx = x + dx[dir];
        const int ny = y + dy[dir];
        if (__builtin_expect(is_outer[nx][ny], 0)) continue;
        
        const int added_count = (vertex_masks[nx][ny] & invmask).count();
        // enforcing the req 2/3 new squares unless n=3 where it is impossible
        if (__builtin_expect(added_count < dir / 4 + 2 && (n != 3 || added_count == 0), 1)) continue;

        const int new_len = len + 1;
        if (__builtin_expect(new_len > MAX_LEN, 0)) continue;

        path.push_back({nx, ny});
        const auto new_mask = mask | vertex_masks[nx][ny];
        dfs(nx, ny, new_mask, new_len, path, local_best, local_best_path);
        path.pop_back();
    }
}

void search_from(const std::pair<int, int>& start) {
    int local_best = std::numeric_limits<int>::max();
    std::vector<std::pair<int, int>> local_best_path;
    std::vector<std::pair<int, int>> path;
    path.reserve(MAX_LEN + 1);
    path.push_back(start);
    dfs(start.first, start.second, vertex_masks[start.first][start.second], 0, path, local_best, local_best_path);
}

void run_parallel_search(const std::vector<std::pair<int, int>>& starts) {
    const int hw_concurrency = static_cast<int>(std::thread::hardware_concurrency());
    const int num_threads = std::min(static_cast<int>(starts.size()), std::max(1, hw_concurrency));
    std::vector<std::thread> threads;
    threads.reserve(num_threads);

    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back([&starts, num_threads, i] {
            for (size_t j = i; j < starts.size(); j += num_threads) {
                search_from(starts[j]);
            }
        });
    }

    for (auto& thread : threads) {
        thread.join();
    }
}

int main() {
    // starting positions except the middle tip of the triangle
    constexpr int max_x = (n < 3) ? 1 : n / 2;
    std::vector<std::pair<int, int>> starts;
    for (int x = 1; x <= max_x; ++x) {
        for (int y = 1; y <= x; ++y) {
            if (n < 4 || !(x == max_x && y == max_x))
                starts.emplace_back(x, y);
        }
    }
    run_parallel_search(starts);

    return 0;
}
\$\endgroup\$
0

1 Answer 1

8
\$\begingroup\$

Time complexity

You have \$O(n^2)\$ start points. Each of those run a DFS. Assuming, as a lower bound, that the DFS explores every possible edge, then that's \$O(n^2)\$ steps per DFS. The actual complexity will likely be higher, as there are multiple paths that touch all squares. For each step, it explores eight directions, but that's just \$O(1)\$. However, mask.count() and mask.all() hide another \$O(n^2)\$, since std::bitset doesn't store the number of bits that are set, so these member functions have to iterate over all of the bits.

All this combines means your algorithm has time complexity of at least \$O(n^6)\$. For \$n=10\$ that means \$10^6\$ steps, which shouldn't take 10 hours, which hints that the time complexity is even worse.

I think this means that you should look into better heuristics, as I suspect that your DFS is wasting time looking at paths that are not useful. In particular, I think you should check that the path doesn't stray too far from the borders or from earlier parts of the path. For example, if the path would go from the top center straight down to the bottom center, then from then on it either has to go left or right, but it can never fill the other side anymore. Also note that the solutions you have so far either show the paths filling the square with a zig-zag pattern or with a spiral.

Memory usage

The size of MaskType is \$O(n^4)\$. Given that std::bitset is quite efficient with storage, that shouldn't be a problem for \$n=10\$.

The recursion depth of the DFS is \$O(n^2)\$. Each recursion level needs some memory for the function parameters and local variables. However, since you have some masks as parameters and local variables, that means the stack memory usage is actually \$O(n^4)\$, although again I don't think that would be an issue at \$n=10\$.

Do you really need to pass a new mask to each recursion level? I think that you should be able to just have a single instance of the mask, and pass a reference to it just like you do for the path. Sure, updating that mask on entry and exit from a recursion level is not trivial, but for larger values of \$n\$ it will start to pay off.

Also consider grouping related information together in a struct, instead of passing 7 different parameters.

Make better use of (newer) C++ features

\$\endgroup\$
3
  • \$\begingroup\$ I used jthread, fetch_max(), [[likely]], clamp(), and dfs gets a reference now. Tried replacing the use of vertex_masks with four manual bit checks but it seems that the index calculations are slower than the 100 ors. Thinking about passing along the last direction in dfs for better pruning. I also thought about using a sliding 4x4 window of only the reachable by one step bits and only checking mask.all() once the window is filled, but foregoing the heuristic doesn't seem worth it. Not sure how one would go about quickly checking whether or not the unvisited squares are disjointed. \$\endgroup\$ Commented Mar 18 at 17:19
  • \$\begingroup\$ More on the patterns that we humans can observe: Indeed I could bias it towards spirals or zig-zags but that would only help if I didn't already set MAX_LEN to the shortest known path. It is worth noting that during those ~10 hours it did find my best solution but its task is brute-forcing the minimum length, not just finding an example. \$\endgroup\$ Commented Mar 18 at 17:28
  • \$\begingroup\$ I know, but you need to find heuristics that abort going down a path that will not result in filling the square as early as possible, otherwise it wastes a huge amount of time. The \$O(n^6)\$ is a very conservative lower bound. If you didn't have any heuristics, then finding all possible paths that fill the square is probably \$O(c^{n^2})\$, where \$c\$ is some constant smaller than 8, because at each level there are up to eight possible directions the path can go in. \$\endgroup\$ Commented Mar 18 at 18:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.