Remarks and adjustments (probably caused by obfuscation):
- I'll assume the WireGuard peer uses 10.8.0.105 instead of 10.8.0.107, to match the nftables ruleset.
- 233.252.0.0 would cause problems in a simulation (especially on peer) because it's a multicast address. I'll use 192.0.2.233 below (with no network relation to 192.0.2.2).
This question is to solve a problem about forwarding done with a dnat rule and rather than just local traffic. There's also a hidden problem with the WireGuard tunnel envelope also fixed at the end.
The behavior of the routing stack with the ping test (including the actual dnat that happened in prerouting and the un-masquerade that will happen for the reply) can be summarized with these two commands that query the kernel about what route will be used:
# ip route get from 192.0.2.2 iif ens19 to 10.8.0.105
10.8.0.105 from 192.0.2.2 dev wg0
cache iif ens19
# ip route get from 10.8.0.105 iif wg0 to 192.0.2.2
192.0.2.2 from 10.8.0.105 via 203.0.113.1 dev ens18
cache iif wg0
Here one can see the reply uses the wrong interface. The address will be rewritten by the conntrack entry's content: still 198.51.100.105 even if it doesn't appear above.
This one is caused by a missing rule: anything that comes (back) from wg0 should use the table subnets. Fixed with:
ip rule add iif wg0 lookup subnets
This also fixes the case with rp_filter=1 where the first route test above would just fail with RTNETLINK answers: Invalid cross-device link, even if normally one should add the wg0 route in this table too.
giving now:
# ip route get from 10.8.0.105 iif wg0 to 192.0.2.2
192.0.2.2 from 10.8.0.105 via 198.51.100.1 dev ens19 table subnets
cache iif wg0
The ping test will now work correctly.
There's an additional somewhat hidden WireGuard envelope routing problem to.
The combination of:
- not having enabled Strict Reverse Path Forwarding (RFC 3704)
- having the peer contact the server first (see the additional issue at the end)
- having (at least) the kernel implementation figure out it should reply with the same source it was initially contacted to
allows WireGuard to somewhat work, so a ping 10.8.0.1 from peer gets a reply and allows any following WireGuard traffic to continue using the same envelope addresses.
When not stating a source address for a local (non-routed) flow, the routing stack has to figure out which one should be used for the given route. This is especially important for UDP where a socket is often kept unbound (ie: having source 0.0.0.0 aka INADDR_ANY). This is not an issue for a TCP server, as the duplicated established socket created after accept(2) is not bound to 0.0.0.0 anymore but to the correct address: it will then present this address to the routing stack. Here, WireGuard uses UDP with INADDR_ANY. In particular it doesn't bind to 198.51.100.3. That means it presents as source 0.0.0.0 and leaves to the kernel's routing stack the resolution of the outgoing source IP address.
If server's WireGuard had been initiating the very first packet (rather than peer doing it), it would have used 203.0.113.134 instead of 198.51.100.3: the routing stack has no specific ip rule for 0.0.0.0: the ip rule 32765: from 198.51.100.0/24 lookup subnets doesn't match and no special policy routing is applied. In the end, the UDP packet leaves as 203.0.113.134 using ens18.
It appears the kernel implementation at least then continues to use the same address it was queried on. That's not to be relied upon, multi-homing with UDP services requires special support (eg: using IP_PKTINFO) from applications because of this.
Sought outcome for WireGuard:
# ip route get from 198.51.100.105 to 192.0.2.233
192.0.2.233 from 198.51.100.105 via 198.51.100.1 dev ens19 table subnets uid 0
cache
Actual outcome at least if it's the first to initiate traffic:
# ip route get from 0.0.0.0 to 192.0.2.233
192.0.2.233 via 203.0.113.1 dev ens18 src 203.0.113.134 uid 0
cache
To really fix the WireGuard tunnel multi-homed routing itself, one can use a per-L4-protocol routing rule:
ip rule add iif lo ipproto udp sport 51820 lookup subnets
(iif lo is a special syntax to mean locally initiated (non-forwarded) traffic, it's not really about the lo interface).
Giving:
# ip route get from 0.0.0.0 ipproto udp sport 51820 to 192.0.2.233
192.0.2.233 via 198.51.100.1 dev ens19 table subnets src 198.51.100.3 uid 0
cache
Despite presenting INADDR_ANY as source, having the UDP source port 51820 now selects the subnets routing table.