-1

I have set up logical replication from a PostgreSQL 14 instance running on an Azure VM to a PostgreSQL 16 Flexible Server in Azure.

Here’s the setup and process I followed:

Source (VM): PostgreSQL 14, 4 cores, 16GB RAM

Replica (Flexible Server): PostgreSQL 16, 2 cores, 8GB RAM, P6 (240 IOPS) tier

Set max_replication_workers to 10 on the main DB and 8 on the replica

Created two publications on the source:

One publication includes 3 tables

The other publication includes 4 tables

Created two logical replication slots on the main DB (one for each publication)

Took a data dump of these tables and restored them on the replica to ensure identical data

Created two subscriptions on the replica, using the respective replication slots and without initial sync

Replication works fine initially — data changes from the source reflect correctly on the replica.

However, after about a week, replication breaks. At that point:

Refreshing the publication or toggling the subscriptions doesn’t help

The replication cannot be restarted

The only way to fix it is to start over from the dump and reinitialize the replication setup

All replication parameters are fine-tuned, and both servers are on stable connections.

My question:

Why does this replication consistently break after a week? Is this due to version mismatch (PostgreSQL 14 → 16), WAL retention, slot corruption, or something else specific to Azure Flexible Server?

What could be the best way to ensure long-term stability for logical replication between these two servers?

2
  • 2
    How do you expect anyone to answer the "why" question if you provide absolutely no information? You're the one with access to the logs, so look at them and you'll find the reason. Commented Oct 23 at 6:24
  • 1
    And describe in detail what you mean by "break". Commented Oct 23 at 10:26

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.