Postgresql+Keepalived高可用方案

2022-05-28 13:50:42

一、变更计划与方案

1.1 概述
现有postgresql数据库为单一数据库，没有备份数据，为保证数据安全正常运行，现将给定的3台服务器，组成一个1主2从的高可用架构。
1.2 基本信息

主机名	IP地址	服务器角色	系统信息	数据库目录
72_123	192.168.247.123	Postgresql主节点	Red Hat7.7	/data
72_124	192.168.247.124	Postgresql从节点	Red Hat7.7	/data
72_125	192.168.247.125	Postgresql从节点	Red Hat7.7	/data

1.3实施计划
先将247.124和247.125设置为247.123的从库，然后安装Keepalived 高可用组件,搭建高可用架构

二、实施方案

2.1 各节点安装postgres (root用户)
准备postgresql-11-10.tar.gz软件包
https://www.postgresql.org/ftp/source/v11.10/
安装相关依赖包

yum -y install readline readline-devel zlib zlib-devel

useradd  postgres
password　postgres

mkdir /opt/pg11/
mkdir /data/
chown -R postgres:postgres /opt/pg11/
chown -R postgres:postgres /data/

解压并编译安装

tar -xvf postgresql-11.10.tar.gz 
cd postgresql-11.10/
./configure --prefix=/opt/pg11/
make world
make install-word

设置环境变量

su - postgres
vim .bashrc
export LD_LIBRARY_PATH=/opt/pg11/lib
export PGDATA=/data／
export PGHOME=/opt/pg11/
export PATH=$PATH:/opt/pg11/bin/

初始化数据库

/opt/pg11/bin/initdb -D /data/

修改配置文件(参数值根据服务器配置填写）

listen_addresses = '*' 
port = 5432 
max_connections = 100 
shared_buffers = 128MB
dynamic_shared_memory_type = posix 
wal_level = replica 
max_wal_size = 1GB  
min_wal_size = 80MB
track_commit_timestamp = on    
log_timezone = 'PRC'
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'en_US.UTF-8'                
lc_monetary = 'en_US.UTF-8'    
lc_numeric = 'en_US.UTF-8'    
lc_time = 'en_US.UTF-8' 
default_text_search_config = 'pg_catalog.english'

启动数据库

pg_ctl -D /data/ -l logfile start

如果是新建数据库可以将以下主从同步一同配置后再启动数据库

2.2 主库配置（postgres用户配置）
创建replicator 用户进行主从同步

create role replicator with login replication password 'replicator';

修改pg_hba.conf，允许replicator 用户来同步

host    all             all             192.168.247.0/24        md5  
host    replication     replicator      192.168.247.0/24        md5

修改postgresql.conf,修改或添加以下内容

wal_level = replica  #热备模式
max_wal_senders= 10  #设置最达流复制链接，控制主库最多可以有多少个并发standby数据库）
wal_keep_segments = 512   #重要配置（配置保存wal日志大小） 
wal_sender_timeout = 60s

重启数据库生效

pg_ctl restart

2.3 从库配置（postgres用户配置）
将主库数据拷贝至从库(247.124，247.125)

pg_basebackup -h 192.168.247.123 -D /home/postgres/data/ -F p -X stream -v -P -U replicator 
输入密码：replicator

修改或添加postgresql.conf以下内容

max_connections = 1100 #一般从库的最大连接数要大于主库
hot_standby = on #说明这台机器不仅仅是用于数据归档，也用于数据查询
max_standby_streaming_delay = 30s #数据流备份的最大延迟时间
wal_receiver_status_interval = 10s #多久向主报告一次从的状态，当然从每次数据复制都会向主报告状态，这里只是设置最长的间隔时间

配置recovery.conf

standby_mode = on    
primary_conninfo = ‘host=192.168.247.123 port=5432 user=replicator password=replicator’
recovery_target_timeline = ‘latest’

启动数据库(247.124，247.125)

pg_ctl start

2.4 安装keepalived(root用户配置）
主节点247.123，从节点247.124，247.125

yum -y install keepalived
touch /etc/keepalived/pg_check.sh
chmod +x /etc/keepalived/pg_check.sh

pg_check.sh脚本

#!/bin/bash
#PG is alive or not?
A=`ps -ef | grep postgres | grep "/opt/pg11/bin/postgres*" |grep -v grep| wc -l`
# where the VIP floats to
B=`ip a | grep 192.168.247.121 | wc -l`
# the slave  is in a waiting state?
C=`ps -ef | grep postgres | grep 'startup process' |grep -v grep| wc -l`
#the secondary  is positive for linking to the master ?
D=`ps -ef | grep postgres | grep 'receiver' |grep -v grep | wc -l`
#he connection between the master and the slave library is normal?
E=`ps -ef | grep postgres | grep 'sender'  |grep -v grep| wc -l`
#If pg dies, log the message and turn off keepalived
if [ $A -eq 0 ];then
    echo "`date "+%Y-%m-%d--%H:%M:%S"` postgresql stop so vip stop " >> /etc/keepalived/log/check_pg.log
    systemctl stop keepalived
else
        #When the master is dead, the VIP floats to slave, elevating slave's status to make him read and write
        if [ $B -eq 1 -a $C -eq 1 -a $D -eq 0 ];then
                su - postgres -c "/opt/pg11/bin/pg_ctl promote -D /data/"
                echo "`date "+%Y-%m-%d--%H:%M:%S"` standby promote " >> /etc/keepalived/log/check_pg.log
        fi

        #Judge yourself as Lord and lose contact with him
        if [ $B -eq 1 -a $C -eq 0 -a $D -eq 0 -a $E -eq 0 ];then
                                sleep 10
                echo "`date "+%Y-%m-%d--%H:%M:%S"` can't find standby " >> /etc/keepalived/log/check_pg.log
        fi
fi

配置Keepalived日志

vim /etc/sysconfig/keepalived
# Options for keepalived. See `keepalived --help' output and keepalived(8) and
# keepalived.conf(5) man pages for a list of all options. Here are the most
# common ones :
#
# --vrrp               -P    Only run with VRRP subsystem.
# --check              -C    Only run with Health-checker subsystem.
# --dont-release-vrrp  -V    Dont remove VRRP VIPs & VROUTEs on daemon stop.
# --dont-release-ipvs  -I    Dont remove IPVS topology on daemon stop.
# --dump-conf          -d    Dump the configuration data.
# --log-detail         -D    Detailed log messages.
# --log-facility       -S    0-7 Set local syslog facility (default=LOG_DAEMON)

KEEPALIVED_OPTIONS="-D -d -S 0"

修改/etc/rsyslog.conf文件，在末尾添加如下配置

local0.*    /var/log/keepalived.log

重启日志记录服务

systemctl restart rsyslog

查看Keepalived日志

tail -f /var/log/keepalived.log

修改Keepalived配置文件
(主节点247.123)

mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

vim /etc/keepalived/keepalived.conf

! Configuration File for keepalived
global_defs {
   notification_email {
     **@qq.com
   }
   notification_email_from **@163.com
   smtp_server smtp.163.com
   smtp_connect_timeout 30
   router_id postgres_master
}
vrrp_script check_pg_alived {
    script "/etc/keepalived/pg_check.sh"
    interval 10
    weight 2
}
vrrp_instance VI_1 {
    state MASTER
    interface ens33
    virtual_router_id 66
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234
    }
      track_script {
        check_pg_alived
     }
    virtual_ipaddress {
        192.168.247.121
    }
}

(从节点247.124,247.125)

! Configuration File for keepalived

global_defs {
   notification_email {
     **@qq.com
   }
   notification_email_from **@163.com
   smtp_server smtp.163.com
   smtp_connect_timeout 30
   router_id postgres_master
}

vrrp_script check_pg_alived {
    script "/etc/keepalived/pg_check.sh"
    interval 10
    fall 5
}

vrrp_instance VI_1 {
    state BACKUP
    nopreempt
    interface ens33
    virtual_router_id 66
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234
    }
      track_script {
        check_pg_alived
     }
    virtual_ipaddress {
        192.168.247.121
    }
}

所有节点启动keepalived服务，并查看状态（root用户）

systemctl restart keepalived
systemctl enable keepalived
systemctl status keepalived

3 高可用验证（测试环境）

3.1各节点postgres进程正常，keepalived状态正常，查看VIP节点

[root@72-123 keepalived]# ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.247.123/24 brd 192.168.247.255 scope global noprefixroute ens33
inet 192.168.247.121/32 scope global ens33

3.2 主节点宕机模拟

[postgres@72-123 ~]$ pg_ctl stop
waiting for server to shut down....2021-10-23 01:29:37.950 PDT [59281] LOG:  received fast shutdown request
2021-10-23 01:29:37.952 PDT [59281] LOG:  aborting any active transactions
2021-10-23 01:29:37.953 PDT [59281] LOG:  background worker "logical replication launcher" (PID 59288) exited with exit code 1
2021-10-23 01:29:37.953 PDT [59283] LOG:  shutting down
2021-10-23 01:29:37.972 PDT [59281] LOG:  database system is shut down
 done
server stopped

3.3 VIP迁移到从节点

[postgres@72-125 ~]$2021-10-23 01:29:42.846 PDT [62019] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.247.121" and accepting
                TCP/IP connections on port 5432?
2021-10-23 01:29:47.855 PDT [62052] LOG:  connection received: host=192.168.247.121 port=53342
2021-10-23 01:29:47.856 PDT [62053] LOG:  connection received: host=192.168.247.124 port=59440
2021-10-23 01:29:47.856 PDT [62052] LOG:  replication connection authorized: user=replicator
2021-10-23 01:29:47.857 PDT [62053] LOG:  replication connection authorized: user=replicator
2021-10-23 01:29:47.857 PDT [62051] LOG:  started streaming WAL from primary at 0/A000000 on timeline 1

查看VIP

[postgres@72-125 ~]$ ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.247.125/24 brd 192.168.247.255 scope global noprefixroute ens33
    inet 192.168.247.121/32 scope global ens33

3.4连接测试

[postgres@72-124 ~]$ psql -h 192.168.247.121 -U postres -p 
psql (11.10)
Type "help" for help.
postgres=#

3.5恢复节点，启动数据库，并启动keepalived服务

[postgres@72-123 ~]$ pg_ctl start
waiting for server to start....2021-10-23 01:37:13.120 PDT [60997] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-10-23 01:37:13.120 PDT [60997] LOG:  listening on IPv6 address "::", port 5432
2021-10-23 01:37:13.122 PDT [60997] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-23 01:37:13.132 PDT [60998] LOG:  database system was shut down at 2021-10-23 01:29:37 PDT
2021-10-23 01:37:13.134 PDT [60997] LOG:  database system is ready to accept connections
 done
server started

[root@72-123 keepalived]# systemctl start keepalived
[root@72-123 keepalived]# ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.247.123/24 brd 192.168.247.255 scope global noprefixroute ens33
    inet 192.168.247.121/32 scope global ens33

码农公寓

一、变更计划与方案

二、实施方案

3 高可用验证（测试环境）

相关文章