5

I've installed postgres master and slave nodes by this manual:

At the master (10.0.0.1):

sudo -u postgres initdb -D /data/postgres/main

Change postgresql.conf:

listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 2
logging_collector = on
max_wal_senders = 5

Add record for replication to pg_hba.conf:

host replication replica 10.0.0.2/32 md5

Then start postgres

service postgresql start

Create replication user:

CREATE USER replica WITH REPLICATION ENCRYPTED PASSWORD '123456';

And add replication slot:

SELECT pg_create_physical_replication_slot('standby_slot');

At the slave (10.0.0.2):

pg_basebackup -h 10.0.0.1 -U replica -D /db/postgres/main -X s -P

Change state to standby in postgresql.conf:

hot_standby = on

Add recovery.conf:

standby_mode = 'on'
primary_conninfo = 'host=10.0.0.1 port=5432 user=replica password=123456'
primary_slot_name = 'standby_slot'
trigger_file = 'trigger'

Start standby:

service postgresql start

So, replication is working fine. If I create database, it will appear both at the master and the slave.

If slave is going down and then starting, all changes come there. It all fine too.

But I have problem if master is down. When it stops, I create trigger file at the slave. Then I create another database at the slave, which now is primary. After that I start master, trigger file remove automatically, recovery.conf rename to recovery.done automatically too. Ok. But how to synchronize changes from slave (new primary) to master (old primary)?

I found that I've got to do these four steps:

At the master:

Stop postgres:

service postgresql stop

At the slave:

Switch backup mode on:

SELECT pg_backup_start('backup', true);

Then rsync new data from slave to master.

And then switch backup mode off:

SELECT pg_backup_stop();

After last step I get message

NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup

And then if I start master it doesn't start with log message:

FATAL: could not locate required checkpoint record

I don't understand, how to get new data at the master and make it primary again.

So at the future I plan to install pgpool2 for balancing and failover, but test installing did not work because I don't know, how this mechanism should work.

1 Answer 1

4

You cannot just jump back to the old master, as it is said here:

Once failover to the standby occurs, there is only a single server in operation. This is known as a degenerate state. The former standby is now the primary, but the former primary is down and might stay down. To return to normal operation, a standby server must be recreated, either on the former primary system when it comes up, or on a third, possibly new, system.

For this, you need a recovery.conf on the old master. This should be nearly the same as the present one, only the connection has to point elsewhere.

4
  • So, I tried to make my old primary as standby. I did '/usr/local/pgsql/bin/pg_basebackup -h 10.0.0.2 -U replica -D /db/postgres/main -X s -P' to get the base contents and config files. Then I renamed recovery.done to recovery.conf, and I changed host-address at line primary_conninfo in the recovery.conf. Also I changed IP-address in the pg_hba.conf. After I started old master I still have no replication relation between two servers. If I drop some database on the new primary, I don't see changes on the old one Commented May 29, 2015 at 8:32
  • @user2627744 Do any errors pop up in the logs? Commented May 29, 2015 at 8:47
  • yes, it told me about absent replication slot. I added it by 'SELECT pg_create_physical_replication_slot('standby_slot')' at new primary and replication now work. So, as I understood, now I can make old master primary again by repeating my actions: stop new primary, touch trigger at old primary, pg_basebackup from old primary to new primary, edit config files, add replication slot and start new primary as standby again. I hope pgpool can do it without errors.. Commented May 29, 2015 at 9:21
  • Great! What I was not sure about was if the replication slot itself is replicated - apparently, you have to create it after failover. Commented May 29, 2015 at 9:33

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.