repmgr切换以及故障转移恢复

[pg10@data01 data]$ repmgr -f ~/conf/repmgr.conf  cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | primary | * running |              | default  | 100      | 1        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

[pg10@data01 data]$ 




[pg10@data02 conf]$ repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | primary | * running |              | default  | 100      | 1        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

[pg10@data02 conf]$ 



[pg10@data03 conf]$ repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | primary | * running |              | default  | 100      | 1        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | default  | 100      | 1        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

[pg10@data03 conf]$ 


在节点2上执行 ,将主节点切换到2节点


[pg10@data02 conf]$  repmgr standby switchover -f ~/conf/repmgr.conf --siblings-follow

NOTICE: executing switchover on node "10.10.10.102" (ID: 2)

NOTICE: local node "10.10.10.102" (ID: 2) will be promoted to primary; current primary "10.10.10.101" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "10.10.10.101" (ID: 1)

NOTICE: issuing CHECKPOINT on node "10.10.10.101" (ID: 1) 

DETAIL: executing server command "pg_ctl  -D '/home/pg10/data' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/7000028

NOTICE: promoting standby to primary

DETAIL: promoting server "10.10.10.102" (ID: 2) using "pg_ctl  -w -D '/home/pg10/data' promote"

waiting for server to promote.... done

server promoted

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "10.10.10.102" (ID: 2) was successfully promoted to primary

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/7000028; rejoin target node's fork point: 0/7000098

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "pg_ctl  -w -D '/home/pg10/data' start"

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

NOTICE: node  "10.10.10.102" (ID: 2) promoted to primary, node "10.10.10.101" (ID: 1) demoted to standby

NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "10.10.10.102" is now primary and node "10.10.10.101" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

[pg10@data02 conf]$ 


节点1上查询


[pg10@data01 data]$  repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | standby |   running | 10.10.10.102 | default  | 100      | 1        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | primary | * running |              | default  | 100      | 2        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.102 | default  | 100      | 1        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

[pg10@data01 data]$ 








  --配置自动切换参数到 ~/conf/repmgr.conf 文件中

节点1 

                    [pg10@data01 ~]$ vim ~/conf/repmgr.conf 

                    node_id=1

                    node_name='10.10.10.101'

                    conninfo='host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666'

                    data_directory='/home/pg10/data'

                    log_level='info'

                    log_facility='STDERR'

                    log_file='/home/pg10/conf/repmgr.log'

                    failover=automatic

                    promote_command='/home/pg10/soft/bin/repmgr standby promote -f /home/pg10/conf/repmgr.conf --log-to-file'

                    follow_command='/home/pg10/soft/bin/repmgr standby follow -f /home/pg10/conf/repmgr.conf --log-to-file --upstream-node-id=%n'


节点2

                    node_id=2

                    node_name='10.10.10.102'

                    conninfo='host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666'

                    data_directory='/home/pg10/data'

                    log_level='info'

                    log_facility='STDERR'

                    log_file='/home/pg10/conf/repmgr.log'

                    failover=automatic

                    promote_command='/home/pg10/soft/bin/repmgr standby promote -f /home/pg10/conf/repmgr.conf --log-to-file'

                    follow_command='/home/pg10/soft/bin/repmgr standby follow -f /home/pg10/conf/repmgr.conf --log-to-file --upstream-node-id=%n'

节点3

                    node_id=3

                    node_name='10.10.10.103'

                    conninfo='host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666'

                    data_directory='/home/pg10/data'

                    log_level='info'

                    log_facility='STDERR'

                    log_file='/home/pg10/conf/repmgr.log'

                    failover=automatic

                    promote_command='/home/pg10/soft/bin/repmgr standby promote -f /home/pg10/conf/repmgr.conf --log-to-file'

                    follow_command='/home/pg10/soft/bin/repmgr standby follow -f /home/pg10/conf/repmgr.conf --log-to-file --upstream-node-id=%n'


启动 repmgrd 进程

[pg10@data01 ~]$ repmgrd -f ~/conf/repmgr.conf -d 

[2021-01-06 09:52:00] [NOTICE] redirecting logging output to "/home/pg10/conf/repmgr.log"


pg10@data02 ~]$ repmgrd -f ~/conf/repmgr.conf -d 

[2021-01-06 10:53:20] [NOTICE] redirecting logging output to "/home/pg10/conf/repmgr.log"



[pg10@data03 conf]$ repmgrd -f ~/conf/repmgr.conf -d 

[2021-01-06 10:53:26] [NOTICE] redirecting logging output to "/home/pg10/conf/repmgr.log"




任意节点查看集群状态

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | standby |   running | 10.10.10.102 | default  | 100      | 2        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | primary | * running |              | default  | 100      | 2        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.102 | default  | 100      | 2        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 

查看服务状态

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status    | Upstream     | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+-----------+--------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | standby |   running | 10.10.10.102 | running | 74894 | no      | 0 second(s) ago    

 2  | 10.10.10.102 | primary | * running |              | running | 74212 | no      | n/a                

 3  | 10.10.10.103 | standby |   running | 10.10.10.102 | running | 74615 | no      | 0 second(s) ago    


 

 

模拟故障停掉节点2

 [pg10@data02 ~]$ 

[pg10@data02 ~]$  pg_ctl stop -D $PGDATA -l /tmp/logfile 

waiting for server to shut down.... done

server stopped


 

持续观察下,发现节点1又转换成了主节点,持续时间长点。

 

 [pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status        | Upstream       | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+---------------+----------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | standby |   running     | ? 10.10.10.102 | running | 74894 | no      | 7 second(s) ago    

 2  | 10.10.10.102 | primary | ? unreachable | ?              | n/a     | n/a   | n/a     | n/a                

 3  | 10.10.10.103 | standby |   running     | ? 10.10.10.102 | running | 74615 | no      | 7 second(s) ago    


WARNING: following issues were detected

  - unable to connect to node "10.10.10.101" (ID: 1)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.101" (ID: 1) is attached to its upstream node "10.10.10.102" (ID: 2)

  - unable to  connect to node "10.10.10.102" (ID: 2)

  - node "10.10.10.102" (ID: 2) is registered as an active primary but is unreachable

  - unable to connect to node "10.10.10.103" (ID: 3)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.103" (ID: 3) is attached to its upstream node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status        | Upstream       | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+---------------+----------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | standby |   running     | ? 10.10.10.102 | running | 74894 | no      | 10 second(s) ago   

 2  | 10.10.10.102 | primary | ? unreachable | ?              | n/a     | n/a   | n/a     | n/a                

 3  | 10.10.10.103 | standby |   running     | ? 10.10.10.102 | running | 74615 | no      | 10 second(s) ago   


WARNING: following issues were detected

  - unable to connect to node "10.10.10.101" (ID: 1)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.101" (ID: 1) is attached to its upstream node "10.10.10.102" (ID: 2)

  - unable to  connect to node "10.10.10.102" (ID: 2)

  - node "10.10.10.102" (ID: 2) is registered as an active primary but is unreachable

  - unable to connect to node "10.10.10.103" (ID: 3)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.103" (ID: 3) is attached to its upstream node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status        | Upstream       | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+---------------+----------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | standby |   running     | ? 10.10.10.102 | running | 74894 | no      | 13 second(s) ago   

 2  | 10.10.10.102 | primary | ? unreachable | ?              | n/a     | n/a   | n/a     | n/a                

 3  | 10.10.10.103 | standby |   running     | ? 10.10.10.102 | running | 74615 | no      | 13 second(s) ago   


WARNING: following issues were detected

  - unable to connect to node "10.10.10.101" (ID: 1)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.101" (ID: 1) is attached to its upstream node "10.10.10.102" (ID: 2)

  - unable to  connect to node "10.10.10.102" (ID: 2)

  - node "10.10.10.102" (ID: 2) is registered as an active primary but is unreachable

  - unable to connect to node "10.10.10.103" (ID: 3)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.103" (ID: 3) is attached to its upstream node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status        | Upstream       | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+---------------+----------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | standby |   running     | ? 10.10.10.102 | running | 74894 | no      | 40 second(s) ago   

 2  | 10.10.10.102 | primary | ? unreachable | ?              | n/a     | n/a   | n/a     | n/a                

 3  | 10.10.10.103 | standby |   running     | ? 10.10.10.102 | running | 74615 | no      | 40 second(s) ago   


WARNING: following issues were detected

  - unable to connect to node "10.10.10.101" (ID: 1)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.101" (ID: 1) is attached to its upstream node "10.10.10.102" (ID: 2)

  - unable to  connect to node "10.10.10.102" (ID: 2)

  - node "10.10.10.102" (ID: 2) is registered as an active primary but is unreachable

  - unable to connect to node "10.10.10.103" (ID: 3)'s upstream node "10.10.10.102" (ID: 2)

  - unable to determine if node "10.10.10.103" (ID: 3) is attached to its upstream node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf service status

 ID | Name         | Role    | Status    | Upstream     | repmgrd | PID   | Paused? | Upstream last seen

----+--------------+---------+-----------+--------------+---------+-------+---------+--------------------

 1  | 10.10.10.101 | primary | * running |              | running | 74894 | no      | n/a                

 2  | 10.10.10.102 | primary | - failed  | ?            | n/a     | n/a   | n/a     | n/a                

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | running | 74615 | no      | 1 second(s) ago    


WARNING: following issues were detected

  - unable to  connect to node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ 

[pg10@data01 ~]$ repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | primary | * running |              | default  | 100      | 3        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | primary | - failed  | ?            | default  | 100      |          | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | default  | 100      | 2        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666


WARNING: following issues were detected

  - unable to connect to node "10.10.10.102" (ID: 2)


HINT: execute with --verbose option to see connection error messages

[pg10@data01 ~]$ 


故障节点需要停止,删除 $PGDATA/* /data/appdb_tbs/*   /data/pglog

                重新执行克隆

                打开节点数据库

[pg10@data02 ~]$ cd $PGDATA

[pg10@data02 data]$ ll

total 72

-rw-------. 1 pg10 pg10   199 Jan  6 09:41 backup_label.old

drwx------. 5 pg10 pg10    41 Jan  6 09:41 base

-rw-------. 1 pg10 pg10    37 Jan  6 09:42 current_logfiles

drwx------. 2 pg10 pg10  4096 Jan  6 09:46 global

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_commit_ts

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_dynshmem

-rw-------. 1 pg10 pg10  4883 Jan  6 09:41 pg_hba.conf

-rw-------. 1 pg10 pg10  1636 Jan  6 09:41 pg_ident.conf

drwx------. 4 pg10 pg10    68 Jan  6 10:56 pg_logical

drwx------. 4 pg10 pg10    36 Jan  6 09:41 pg_multixact

drwx------. 2 pg10 pg10    18 Jan  6 09:42 pg_notify

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_replslot

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_serial

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_snapshots

drwx------. 2 pg10 pg10    63 Jan  6 10:56 pg_stat

drwx------. 2 pg10 pg10     6 Jan  6 10:56 pg_stat_tmp

drwx------. 2 pg10 pg10    18 Jan  6 09:47 pg_subtrans

drwx------. 2 pg10 pg10    19 Jan  6 09:41 pg_tblspc

drwx------. 2 pg10 pg10     6 Jan  6 09:41 pg_twophase

-rw-------. 1 pg10 pg10     3 Jan  6 09:41 PG_VERSION

drwx------. 3 pg10 pg10   220 Jan  6 10:52 pg_wal

drwx------. 2 pg10 pg10    18 Jan  6 09:41 pg_xact

-rw-------. 1 pg10 pg10   526 Jan  6 09:41 postgresql.auto.conf

-rw-------. 1 pg10 pg10 23039 Jan  6 09:41 postgresql.conf

-rw-------. 1 pg10 pg10    52 Jan  6 09:42 postmaster.opts

-rw-------. 1 pg10 pg10   169 Jan  6 09:41 recovery.done

-rw-rw-r--. 1 pg10 pg10   969 Jan  6 09:41 server.crt

-rw-------. 1 pg10 pg10  1704 Jan  6 09:41 server.key

[pg10@data02 data]$ rm -rf *

[pg10@data02 data]$ cd /data/

[pg10@data02 data]$ ls

appdb_tbs  pglog

[pg10@data02 data]$ cd appdb_tbs/

[pg10@data02 appdb_tbs]$ ll

total 0

drwx------. 3 pg10 pg10 19 Jan  6 09:41 PG_10_201707211

[pg10@data02 appdb_tbs]$ rm -rf PG_10_201707211/

[pg10@data02 appdb_tbs]$ cd ..

[pg10@data02 data]$ ll

total 0

drwxrwxr-x. 2 pg10 pg10  6 Jan  6 11:01 appdb_tbs

drwxrwxr-x. 2 pg10 pg10 52 Jan  6 09:42 pglog

[pg10@data02 data]$ cd pglog/

[pg10@data02 pglog]$ ll

total 12

-rw-------. 1 pg10 pg10  166 Jan  6 09:42 postgresql.06

-rw-------. 1 pg10 pg10 6199 Jan  6 10:56 postgresql.06.csv

[pg10@data02 pglog]$ rm *

[pg10@data02 pglog]$ 


[pg10@data02 ~]$ repmgr -h 10.10.10.101 -U repmgr -d postgres -f ~/conf/repmgr.conf standby clone --dry-run -p5666

NOTICE: destination directory "/home/pg10/data" provided

INFO: connecting to source node

DETAIL: connection string is: host=10.10.10.101 user=repmgr port=5666 dbname=postgres

DETAIL: current installation size is 30 MB

INFO: "repmgr" extension is installed in database "postgres"

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 9 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: all prerequisites for "standby clone" are met

[pg10@data02 ~]$ repmgr -h 10.10.10.101 -U repmgr -d postgres -f ~/conf/repmgr.conf standby clone  -p5666

NOTICE: destination directory "/home/pg10/data" provided

INFO: connecting to source node

DETAIL: connection string is: host=10.10.10.101 user=repmgr port=5666 dbname=postgres

DETAIL: current installation size is 30 MB

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: checking and correcting permissions on existing directory "/home/pg10/data"

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

  pg_basebackup -l "repmgr base backup"  -D /home/pg10/data -h 10.10.10.101 -p 5666 -U repmgr -X stream 

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /home/pg10/data start

HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record

[pg10@data02 ~]$  pg_ctl start -D $PGDATA -l /tmp/logfile

waiting for server to start.... done

server started

[pg10@data02 ~]$ repmgrd -f ~/conf/repmgr.conf -d 

[2021-01-06 11:03:44] [NOTICE] redirecting logging output to "/home/pg10/conf/repmgr.log"


[pg10@data02 ~]$ repmgr -f ~/conf/repmgr.conf cluster show

 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                                        

----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------------------

 1  | 10.10.10.101 | primary | * running |              | default  | 100      | 3        | host=10.10.10.101 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 2  | 10.10.10.102 | standby |   running |              | default  | 100      | 3        | host=10.10.10.102 user=repmgr dbname=postgres password=1qaz@WSX port=5666

 3  | 10.10.10.103 | standby |   running | 10.10.10.101 | default  | 100      | 3        | host=10.10.10.103 user=repmgr dbname=postgres password=1qaz@WSX port=5666

[pg10@data02 ~]$ 




上一篇:全国最完整的Repmgr高可用Postgresql


下一篇:Postgresql10.14配置repmgr5.1.0执行步骤