kingbaseES R3 集群手工部署案例

系统环境:

操作系统:
[kingbase@node3 bin]$ cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core) 

数据库版本:
TEST=# select version();
                                                         VERSION                                        ----------------------------------------------------------------------------------------------------------
 Kingbase V008R003C002B0270 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)


主机环境:
[kingbase@node3 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.7.248   node1
192.168.7.249   node2
192.168.7.243   node3

kingbaseES R3 集群手工部署案例

一、创建集群前的准备

1、创建节点间ssh互信(手工配置或使用以下脚本配置)

[kingbase@node3 R3_cluster_install]$ cat trust_cluster.sh 
#!/bin/bash

# you should change two parameters: general_user and all_ip
# general_user is the general user which you want to config SSH password free
# all_ip is the devices that you want to config SSH password free

shell_folder=$(dirname $(readlink -f "$0"))
install_conf="${shell_folder}/install.conf"
primary_host=""

curren_user=`whoami`

if [ -f $install_conf ]
then
    source $install_conf
else
    echo "[ERROR] there is no [install.conf] found in current path"
    exit 1
fi

general_user=$cluster_user
[ "${ssh_port}"x = ""x ] && ssh_port=22
[ "${all_node_ip}"x = ""x ] && echo "[ERROR] [all_node_ip] is empty, please check your [install.conf] file" && exit 1
[ "${general_user}"x = ""x ] && general_user="kingbase"

[ "${primary_host}"x = ""x ] && primary_host="${all_node_ip[0]}"

if [ "$curren_user"x != "root"x ]
then
    echo "must use root to execute"
    exit 1;
fi

[ ! -d /home/$general_user ] && /usr/sbin/adduser $general_user
echo "$general_user:123" | chpasswd
[ ! -f /home/$general_user/.ssh ] && mkdir -p /home/$general_user/.ssh

[ ! -f ~/.ssh/id_rsa.pub ] && ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa
[ ! -f  ~/.ssh/authorized_keys ] && cat ~/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

cp /root/.ssh/* /home/$general_user/.ssh

for ips in ${all_node_ip[@]}
do
    [ "${primary_host}"x != ""x -a "${primary_host}"x = "${ips}"x ] && continue
    ssh -p ${ssh_port} root@$ips "test ! -f ~/.ssh/id_rsa.pub" && ssh -p ${ssh_port} root@$ips "ssh-keygen -t rsa -P \"\" -f /root/.ssh/id_rsa"
    scp -P ${ssh_port} -o StrictHostKeyChecking=no -r /root/.ssh/* root@$ips:/root/.ssh/
    ssh -p ${ssh_port} root@$ips "test ! -d /home/$general_user" && ssh -p ${ssh_port} root@$ips "/usr/sbin/adduser $general_user" && ssh -p ${ssh_port} root@$ips "echo \"$general_user:123\" | chpasswd"
done

for ips in ${all_node_ip[@]}
do
    ssh -p ${ssh_port} root@$ips "cp -r /root/.ssh /home/$general_user/"
    ssh -p ${ssh_port} root@$ips "chmod 700 /home/$general_user/.ssh/"
    ssh -p ${ssh_port} root@$ips "chown -R $general_user:$general_user /home/$general_user/.ssh/"
done



执行脚本:
[root@node3 R3_cluster_install]# sh trust_cluster.sh 
authorized_keys                                                               100%   10KB  10.0KB/s   00:00    
id_rsa                                                                        100% 1679     1.6KB/s   00:00    
id_rsa.pub                                                                    100%  392     0.4KB/s   00:00    
known_hosts                                                                   100%  543     0.5KB/s   00:00 



ssh互信测试:
[kingbase@node3 ~]$ ssh root@node1
Last failed login: Mon Mar  1 14:58:26 CST 2021 from 192.168.7.116 on ssh:notty
There were 7 failed login attempts since the last successful login.
Last login: Mon Mar  1 14:15:23 2021 from 192.168.7.116
ABRT has detected 1 problem(s). For more info run: abrt-cli list --since 1614579323
[root@node1 ~]# exit
logout
Connection to node1 closed.

[kingbase@node3 ~]$ ssh node1
Last failed login: Mon Mar  1 18:50:15 CST 2021 from :0 on :0
There was 1 failed login attempt since the last successful login.
Last login: Mon Mar  1 14:15:27 2021
[kingbase@node1 ~]$ exit
logout
Connection to node1 closed.

=== 本案例采用是手工配置互信后,执行集群部署安装===

2、准备集群部署相关文件

=== 将以下文件放置在同一个目录下===

[kingbase@node3 R3_cluster_install]$ ls -lh
total 33M
-rw-rw-r--. 1 kingbase kingbase  29M Mar  1 14:21 db.zip
-rwxr-xr-x. 1 kingbase kingbase 5.2K Mar  1 14:05 install.conf
-rw-rw-r--. 1 kingbase kingbase 4.3M Mar  1 14:21 kingbasecluster.zip
-rwxrw-r--. 1 kingbase kingbase 3.2K Mar  1 14:05 license.dat
-rwxr-xr-x. 1 kingbase kingbase 2.1K Mar  1 14:04 trust_cluster.sh
-rwxr-xr-x. 1 kingbase kingbase  81K Mar  1 14:04 V8R3_cluster_install.sh

3、配置install.conf(根据业务环境)

kingbase@node3 R3_cluster_install]$ cat install.conf|grep -v ^$|grep -v ^#
on_bmj=0
all_node_ip=(192.168.7.243 192.168.7.248)
cluster_path="/home/kingbase/cluster/kha01"
db_package="/home/kingbase/R3_cluster_install/db.zip"
cluster_package="/home/kingbase/R3_cluster_install/kingbasecluster.zip"
license_file=(license.dat)
db_user="SYSTEM"                 # the user name of database
db_password="123456"             # the password of database, since the R3 has a special feature that password cannot be stored in clear text, please delete the password after the cluster deployment is complete.
db_port="54321"                  # the port of database, defaults is 54321
trust_ip="192.168.7.1"
db_vip="192.168.7.245"
cluster_vip="192.168.7.244"
net_device=(enp0s3 enp0s3)
kb_data="/home/kingbase/cluster/kha01/db/data"
ipaddr_path="/sbin"
arping_path="/home/kingbase/cluster/kha01/db/bin"
super_user="root"
cluster_user="kingbase"
wd_deadtime="30"                 # cluster heartbeats timeout, unit: seconds
check_retries="6"                # number of detection retries in case of database failure
check_delay="10"                 # detection retry interval in case of database failure
connect_timeout="10000"          # timeout value in milliseconds before giving up to connect to backend
auto_primary_recovery="0"        # automatic recovery parameter of cluster primary host, default value is 0

三、执行一键部署集群

[kingbase@node3 R3_cluster_install]$ sh V8R3_cluster_install.sh 
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] file format is correct ... OK
[INFO]-Check if the cluster_vip "192.168.7.244" is already exist ...
[INFO] There is no "192.168.7.244" on any host, OK
[INFO]-Check if the db_vip "192.168.7.245" is already exist ...
[INFO] There is no "192.168.7.245" on any host, OK
[CONFIG_CHECK] the number of net_device matches the length of all_node_ip or the number of net_device is 1 ... OK
[CONFIG_CHECK] the number of license_file matches the length of all_node_ip or the number of license_file is 1 ... OK
[Mon Mar  1 14:32:02 CST 2021] [INFO] change ulimit on 192.168.7.243 ...
[Mon Mar  1 14:32:03 CST 2021] [INFO] change ulimit on 192.168.7.243 ... Done
[Mon Mar  1 14:32:03 CST 2021] [INFO] change kernel.sem on 192.168.7.243 ...
[Mon Mar  1 14:32:03 CST 2021] [INFO] change kernel.sem on 192.168.7.243 ... Done
[Mon Mar  1 14:32:03 CST 2021] [INFO] stop selinuxi on 192.168.7.243 ...
[Mon Mar  1 14:32:05 CST 2021] [INFO] stop selinux on 192.168.7.243 ... Done
[Mon Mar  1 14:32:05 CST 2021] [INFO] change RemoveIPC on 192.168.7.243 ...
[Mon Mar  1 14:32:07 CST 2021] [INFO] change RemoveIPC on 192.168.7.243 ... Done
[Mon Mar  1 14:32:07 CST 2021] [INFO] change DefaultTasksAccounting on 192.168.7.243 ...
[Mon Mar  1 14:32:08 CST 2021] [INFO] change DefaultTasksAccounting on 192.168.7.243 ... Done
[Mon Mar  1 14:32:08 CST 2021] [INFO] change sshd_config on 192.168.7.243 ...
[Mon Mar  1 14:32:09 CST 2021] [INFO] change sshd_config on 192.168.7.243 ... Done
[Mon Mar  1 14:32:09 CST 2021] [INFO] configuration to take effect on 192.168.7.243 ...
[Mon Mar  1 14:32:11 CST 2021] [INFO] configuration to take effect on 192.168.7.243 ... Done
[Mon Mar  1 14:32:11 CST 2021] [INFO] change ulimit on 192.168.7.248 ...
[Mon Mar  1 14:32:12 CST 2021] [INFO] change ulimit on 192.168.7.248 ... Done
[Mon Mar  1 14:32:12 CST 2021] [INFO] change kernel.sem on 192.168.7.248 ...
[Mon Mar  1 14:32:12 CST 2021] [INFO] change kernel.sem on 192.168.7.248 ... Done
[Mon Mar  1 14:32:12 CST 2021] [INFO] stop selinuxi on 192.168.7.248 ...
[Mon Mar  1 14:32:13 CST 2021] [INFO] stop selinux on 192.168.7.248 ... Done
[Mon Mar  1 14:32:13 CST 2021] [INFO] change RemoveIPC on 192.168.7.248 ...
[Mon Mar  1 14:32:13 CST 2021] [INFO] change RemoveIPC on 192.168.7.248 ... Done
[Mon Mar  1 14:32:13 CST 2021] [INFO] change DefaultTasksAccounting on 192.168.7.248 ...
[Mon Mar  1 14:32:13 CST 2021] [INFO] change DefaultTasksAccounting on 192.168.7.248 ... Done
[Mon Mar  1 14:32:13 CST 2021] [INFO] change sshd_config on 192.168.7.248 ...
[Mon Mar  1 14:32:14 CST 2021] [INFO] change sshd_config on 192.168.7.248 ... Done
[Mon Mar  1 14:32:14 CST 2021] [INFO] configuration to take effect on 192.168.7.248 ...
[Mon Mar  1 14:32:16 CST 2021] [INFO] configuration to take effect on 192.168.7.248 ... Done
[RUNNING] check if the host can be reached ...
[RUNNING] success connect to the target "192.168.7.243" ..... OK
[RUNNING] success connect to the target "192.168.7.248" ..... OK
[RUNNING] check the port is already in use or not...
[RUNNING] the port is not in use on "192.168.7.243:54321" ..... OK
[RUNNING] the port is not in use on "192.168.7.248:54321" ..... OK
[RUNNING] check if the cluster_path is already exist ...
[RUNNING] the cluster_path is not exist on "192.168.7.243" ..... OK
[RUNNING] the cluster_path is not exist on "192.168.7.248" ..... OK
[INSTALL] create the cluster_path "/home/kingbase/cluster/kha01" on every host ...
[INSTALL] success to create the cluster_path "/home/kingbase/cluster/kha01" on "192.168.7.243" ..... OK
[INSTALL] success to create the cluster_path "/home/kingbase/cluster/kha01" on "192.168.7.248" ..... OK
[INSTALL] decompress the "/home/kingbase/R3_cluster_install/db.zip" and "/home/kingbase/R3_cluster_install/kingbasecluster.zip" to "/home/kingbase/cluster/kha01"

[INSTALL] success to decompress the "/home/kingbase/R3_cluster_install/db.zip" to "/home/kingbase/cluster/kha01" on "192.168.7.243"..... OK
[INSTALL] success to decompress the "/home/kingbase/R3_cluster_install/kingbasecluster.zip" to "/home/kingbase/cluster/kha01" on "192.168.7.243"..... OK
[INSTALL] copy /home/kingbase/cluster/kha01/kingbasecluster/bin/pcp_* and /home/kingbase/cluster/kha01/kingbasecluster/lib/libpcp.* to /home/kingbase/cluster/kha01/db/bin and /home/kingbase/cluster/kha01/db/lib
[INSTALL] copy /home/kingbase/cluster/kha01/kingbasecluster/binpcp_* and /home/kingbase/cluster/kha01/kingbasecluster/lib/libpcp.* to /home/kingbase/cluster/kha01/db/bin and /home/kingbase/cluster/kha01/db/lib .... OK
[INSTALL] create the dir "/home/kingbase/cluster/kha01/db/etc" and  "/home/kingbase/cluster/kha01/kingbasecluster/etc" on all host
[INSTALL] scp the dir "/home/kingbase/cluster/kha01" to other host
[INSTALL] try to copy the cluster_path "/home/kingbase/cluster/kha01" to "192.168.7.248" .....
[INSTALL] success to scp the cluster_path "/home/kingbase/cluster/kha01" to "192.168.7.248" ..... OK
[INSTALL] try to copy the cluster_path "/home/kingbase/cluster/kha01/kingbasecluster" to "192.168.7.248" .....
[RUNNING] chmod u+x for "/sbin" and "/home/kingbase/cluster/kha01/db/bin"
[RUNNING] chmod u+x /sbin/ip on "192.168.7.243" ..... OK
[RUNNING] chmod u+x /home/kingbase/cluster/kha01/db/bin/arping on "192.168.7.243" ..... OK
[RUNNING] chmod u+x /sbin/ip on "192.168.7.248" ..... OK
[RUNNING] chmod u+x /home/kingbase/cluster/kha01/db/bin/arping on "192.168.7.248" ..... OK
[INSTALL] check license_file "/home/kingbase/R3_cluster_install/license.dat"
[INSTALL] success to access license_file: /home/kingbase/R3_cluster_install/license.dat
[INSTALL] Copy license.dat to /home/kingbase/cluster/kha01: /home/kingbase/R3_cluster_install/license.dat
[INSTALL] success to copy license.dat to /home/kingbase/cluster/kha01/db/bin/../../ on 192.168.7.243
[INSTALL] check license_file "/home/kingbase/R3_cluster_install/license.dat"
[INSTALL] success to access license_file: /home/kingbase/R3_cluster_install/license.dat
[INSTALL] Copy license.dat to /home/kingbase/cluster/kha01: /home/kingbase/R3_cluster_install/license.dat
[INSTALL] success to copy license.dat to /home/kingbase/cluster/kha01/db/bin/../../ on 192.168.7.248
[INSTALL] begin to init the database on "192.168.7.243" ...
The files belonging to this database system will be owned by user "kingbase".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

The comparision of strings is case-sensitive.
creating directory /home/kingbase/cluster/kha01/db/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
create samples database ... ok
loading samples database ... ok
loading template2 database ... ok
create security database ... ok
load security database ... ok
syncing data to disk ...

Success. You can now start the database server using:

    /home/kingbase/cluster/kha01/db/bin/sys_ctl -D /home/kingbase/cluster/kha01/db/data -l logfile start

[INSTALL] end to init the database on "192.168.7.243" ... OK
[INSTALL] alter /home/kingbase/cluster/kha01/db/data/kingbase.conf
[INSTALL] Alter /home/kingbase/cluster/kha01/db/data/kingbase.conf ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/db/data/sys_hba.conf
[INSTALL] Alter /home/kingbase/cluster/kha01/db/data/sys_hba.conf ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/HAmodule.conf on 192.168.7.243
[INSTALL] success to alter /home/kingbase/cluster/kha01/db/etc/HAmodule.conf on 192.168.7.243
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/HAmodule.conf on 192.168.7.248

[INSTALL] success to alter /home/kingbase/cluster/kha01/db/etc/HAmodule.conf on 192.168.7.248
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/recovery.done on 192.168.7.243
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/recovery.done on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/recovery.done on 192.168.7.248
[INSTALL] Alter /home/kingbase/cluster/kha01/db/etc/recovery.done on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf on 192.168.7.243
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of CLUSTER on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf on 192.168.7.248
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of CLUSTER on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of node_num on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of node_num on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of node_num on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/kingbasecluster.conf of node_num on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_hba.conf on 192.168.7.243
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_hba.conf on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/pcp.conf on 192.168.7.243
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/pcp.conf on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_passwd on 192.168.7.243
[INSTALL] config /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_passwd on 192.168.7.243 ... OK
[INSTALL] copy /home/kingbase/cluster/kha01/db/etc/HAmodule.conf to /home/kingbase/cluster/kha01/kingbasecluster/etc on 192.168.7.243
[INSTALL] copy /home/kingbase/cluster/kha01/db/etc/HAmodule.conf to /home/kingbase/cluster/kha01/kingbasecluster/etc on 192.168.7.243 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_hba.conf on 192.168.7.248
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_hba.conf on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/pcp.conf on 192.168.7.248
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/pcp.conf on 192.168.7.248 ... OK
[INSTALL] Alter /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_passwd on 192.168.7.248
[INSTALL] config /home/kingbase/cluster/kha01/kingbasecluster/etc/cluster_passwd on 192.168.7.248 ... OK
[INSTALL] copy /home/kingbase/cluster/kha01/db/etc/HAmodule.conf to /home/kingbase/cluster/kha01/kingbasecluster/etc on 192.168.7.248
[INSTALL] copy /home/kingbase/cluster/kha01/db/etc/HAmodule.conf to /home/kingbase/cluster/kha01/kingbasecluster/etc on 192.168.7.248 ... OK
[INSTALL] copy /home/kingbase/cluster/kha01/db/data/kingbase.conf to /home/kingbase/cluster/kha01/db/etc
[INSTALL] copy /home/kingbase/cluster/kha01/db/data/kingbase.conf to /home/kingbase/cluster/kha01/db/etc ... OK
[INSTALL] start up the database on "192.168.7.243" ...
[INSTALL] sys_ctl -D /home/kingbase/cluster/kha01/db/data start -w -t 90 -l /home/kingbase/cluster/kha01/log/kingbase.log
waiting for server to start.... done
server started
[INSTALL] start up the database on "192.168.7.243" ... OK
[INSTALL] clone and start up the slave ...
[INSTALL] Basebackup the slave on "192.168.7.248" ...
[INSTALL] /home/kingbase/cluster/kha01/db/bin/sys_basebackup -h 192.168.7.243 -U SYSTEM -W '*****' -p 54321 -D /home/kingbase/cluster/kha01/db/data -F p -X stream
[INSTALL] Basebackup the slave on "192.168.7.248" ... OK
[INSTALL] Copy /home/kingbase/cluster/kha01/db/etc/recovery.done to /home/kingbase/cluster/kha01/db/data/recovery.conf on 192.168.7.248
[INSTALL] Copy /home/kingbase/cluster/kha01/db/etc/recovery.done to /home/kingbase/cluster/kha01/db/data/recovery.conf on 192.168.7.248 ... OK
[INSTALL] Copy /home/kingbase/cluster/kha01/db/data/kingbase.conf to /home/kingbase/cluster/kha01/db/etc/ on 192.168.7.248
[INSTALL] Copy /home/kingbase/cluster/kha01/db/data/kingbase.conf to /home/kingbase/cluster/kha01/db/etc on 192.168.7.248 ... OK
[INSTALL] start up the slave on "192.168.7.248" ...
[INSTALL] /home/kingbase/cluster/kha01/db/bin/sys_ctl -w -t 60 -l /home/kingbase/cluster/kha01/logfile -D /home/kingbase/cluster/kha01/db/data start
waiting for server to start............................................................... stopped waiting
server is still starting up
[INSTALL] start up the slave on "192.168.7.248" ... OK
[INSTALL] Create physical_replication_slot on 192.168.7.243
 SYS_CREATE_PHYSICAL_REPLICATION_SLOT 
--------------------------------------
 (slot_node1,)
(1 row)

[INSTALL] Create physical_replication_slot on 192.168.7.243 ... OK
[INSTALL] Create physical_replication_slot on 192.168.7.243
 SYS_CREATE_PHYSICAL_REPLICATION_SLOT 
--------------------------------------
 (slot_node2,)
(1 row)

[INSTALL] Create physical_replication_slot on 192.168.7.243 ... OK
[INSTALL] Create physical_replication_slot on 192.168.7.248
ksql: FATAL:  the database system is starting up
[ERROR] Failed to create slot "node1" on 192.168.7.248

=== 有以上信息获知,创建过程出现error===

查看主备库数据库服务:

[kingbase@node3 R3_cluster_install]$ ps -ef |grep kingbase

kingbase 25439     1  0 14:36 ?        00:00:00 /home/kingbase/cluster/kha01/db/bin/kingbase -D /home/kingbase/cluster/kha01/db/data
kingbase 25449 25439  0 14:36 ?        00:00:00 kingbase: logger process   
kingbase 25455 25439  0 14:36 ?        00:00:00 kingbase: checkpointer process   
kingbase 25456 25439  0 14:36 ?        00:00:00 kingbase: writer process   
kingbase 25457 25439  0 14:36 ?        00:00:00 kingbase: wal writer process   
kingbase 25458 25439  0 14:36 ?        00:00:00 kingbase: autovacuum launcher process   
kingbase 25459 25439  0 14:36 ?        00:00:00 kingbase: archiver process   last was 000000010000000000000002
kingbase 25460 25439  0 14:36 ?        00:00:00 kingbase: stats collector process   
kingbase 25461 25439  0 14:36 ?        00:00:00 kingbase: bgworker: syslogical supervisor   
kingbase 25723 25439  0 14:38 ?        00:00:00 kingbase: wal sender process SYSTEM 192.168.7.248(27448) streaming 0/3000060

[kingbase@node1 Lin64]$ ps -ef |grep kingbase

kingbase 17362     1  0 14:35 ?        00:00:00 /home/kingbase/cluster/kha01/db/bin/kingbase -D /home/kingbase/cluster/kha01/db/data
kingbase 17390 17362  0 14:35 ?        00:00:00 kingbase: logger process   
kingbase 17391 17362  0 14:35 ?        00:00:00 kingbase: startup process   recovering 000000010000000000000003
kingbase 18021 17362  0 14:37 ?        00:00:00 kingbase: checkpointer process   
kingbase 18022 17362  0 14:37 ?        00:00:00 kingbase: writer process   
kingbase 18024 17362  0 14:37 ?        00:00:00 kingbase: stats collector process   
kingbase 18025 17362  0 14:37 ?        00:00:00 kingbase: wal receiver process   streaming 0/3000060

查看主备流复制状态:

TEST=# select * from sys_replication_slots;
 SLOT_NAME  | PLUGIN | SLOT_TYPE | DATOID | DATABASE | ACTIVE | ACTIVE_PID | XMIN | CATALOG_XMIN | RESTART_LSN |
 CONFIRMED_FLUSH_LSN 
------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+
---------------------
 slot_node1 |        | physical  |        |          | f      |            |      |              |             |
 
 slot_node2 |        | physical  |        |          | t      |      25723 | 2076 |              | 0/3000060   |
 
(2 rows)

TEST=# select * from sys_stat_replication; 
  PID  | USESYSID | USENAME | APPLICATION_NAME |  CLIENT_ADDR  | CLIENT_HOSTNAME | CLIENT_PORT |         BACKEND
_START         | BACKEND_XMIN |   STATE   | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | 
SYNC_PRIORITY | SYNC_STATE 
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+---------------+----------------+----------------+-----------------+-
--------------+------------
 25723 |       10 | SYSTEM  | node2            | 192.168.7.248 |                 |       27448 | 2021-03-01 14:3
8:07.750100+08 |              | streaming | 0/3000060     | 0/3000060      | 0/3000060      | 0/3000060       | 
            2 | sync
(1 row)

四、重启集群及访问数据库验证

1、重启集群

[kingbase@node3 bin]$ ./kingbase_monitor.sh restart
-----------------------------------------------------------------------
2021-03-01 14:41:49 KingbaseES automation beging...
2021-03-01 14:41:49 stop kingbasecluster [192.168.7.243] ...
DEL VIP NOW AT 2021-03-01 14:41:50 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 14:41:51 Done...
2021-03-01 14:41:51 stop kingbasecluster [192.168.7.248] ...
DEL VIP NOW AT 2021-03-01 14:41:20 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 14:41:52 Done...
2021-03-01 14:41:52 stop kingbase [192.168.7.243] ...
set /home/kingbase/cluster/kha01/db/data down now...
2021-03-01 14:41:58 Done...
2021-03-01 14:41:59 Del kingbase VIP [192.168.7.245/24] ...
DEL VIP NOW AT 2021-03-01 14:41:59 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 14:42:00 Done...
2021-03-01 14:42:00 stop kingbase [192.168.7.248] ...
set /home/kingbase/cluster/kha01/db/data down now...
2021-03-01 14:42:07 Done...
2021-03-01 14:42:08 Del kingbase VIP [192.168.7.245/24] ...
DEL VIP NOW AT 2021-03-01 14:41:37 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 14:42:09 Done...
......................
all stop..
ping trust ip 192.168.7.1 success ping times :[3], success times:[2]
ping trust ip 192.168.7.1 success ping times :[3], success times:[2]
Redirecting to /bin/systemctl restart  crond.service
ADD VIP NOW AT 2021-03-01 14:42:22 ON enp0s3
execute: [/sbin/ip addr add 192.168.7.245/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/kha01/db/bin/arping -U 192.168.7.245 -I enp0s3 -w 1
ARPING 192.168.7.245 from 192.168.7.245 enp0s3
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
Redirecting to /bin/systemctl restart  crond.service
ping vip 192.168.7.245 success ping times :[3], success times:[3]
ping vip 192.168.7.245 success ping times :[3], success times:[2]
now,there is a synchronous standby.
wait kingbase recovery 5 sec...
Redirecting to /bin/systemctl restart  crond.service
Redirecting to /bin/systemctl restart  crond.service
......................
all started..
...
now we check again
=======================================================================
|             ip |                       program|              [status] 
[  192.168.7.243]|             [kingbasecluster]|              [active]
[  192.168.7.248]|             [kingbasecluster]|              [active]
[  192.168.7.243]|                    [kingbase]|              [active]
[  192.168.7.248]|                    [kingbase]|              [active]
=======================================================================

2、登录数据库访问测试

[kingbase@node3 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0270)
Type "help" for help.

TEST=# SHOW pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_d
elay 
---------+---------------+-------+--------+-----------+---------+------------+-------------------+--------
 0       | 192.168.7.243 | 54321 | up     | 0.500000  | primary | 0          | false             | 0
 1       | 192.168.7.248 | 54321 | up     | 0.500000  | standby | 0          | true              | 0
(2 rows)

TEST=# select * from sys_stat_replication;
  PID  | USESYSID | USENAME | APPLICATION_NAME |  CLIENT_ADDR  | CLIENT_HOSTNAME | CLIENT_PORT |         BACKEND
_START         | BACKEND_XMIN |   STATE   | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | 
SYNC_PRIORITY | SYNC_STATE 
-------+----------+---------+------------------+---------------+-----------------+-------------+-----------
 27406 |       10 | SYSTEM  | node2            | 192.168.7.248 |                 |       27473 | 2021-03-01 14:4
2:25.350073+08 |              | streaming | 0/40000D0     | 0/40000D0      | 0/40000D0      | 0/40000D0       | 
            2 | sync
(1 row)


主备同步测试:


TEST=# create database prod;
CREATE DATABASE

TEST=# \l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges  
-----------+--------+----------+-------------+-------------+--------------------
 PROD      | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SAMPLES   | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SECURITY  | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 TEMPLATE0 | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/SYSTEM         +
           |        |          |             |             | SYSTEM=CTcb/SYSTEM
 TEMPLATE1 | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/SYSTEM         +
           |        |          |             |             | SYSTEM=CTcb/SYSTEM
 TEMPLATE2 | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/SYSTEM        +
           |        |          |             |             | SYSTEM=CTcb/SYSTEM
 TEST      | SYSTEM | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(7 rows)

TEST=# \c PROD SYSTEM
You are now connected to database "PROD" as user "SYSTEM".
PROD=# create table t1 (id int ,name varchar(10));  
CREATE TABLE
PROD=# insert into t1 values (10,'tom'),(20,'jerry'),(30,'rose');
INSERT 0 3
PROD=# select * from t1;
 ID | NAME  
----+-------
 10 | tom
 20 | jerry
 30 | rose
(3 rows)

备库:

PROD=# select * from t1;
 ID | NAME  
----+-------
 10 | tom
 20 | jerry
 30 | rose
(3 rows)

五、部署总结

部署脚本执行过程出现错误,但是通过手工启动集群,完成了配置。后面再次测试,通过脚本配置ssh互信后,执行脚本出现“无法启动备库”的故障。
上一篇:失败重试guava-retry


下一篇:C++ 多线程练习题 以及知识点