Opengauss 在 Centos 7.8 下一主两备的安装测试
-
centos要用python3.6
euler用3.7
tar -zxvf Python-3.6.11.tgz
cd Python-3.6.11
./configure --prefix=/usr/python3.6.11 --enable-optimizations --enable-shared CFLAGS=-fPIC
make
make install
## 环境变量配置(不要覆盖python2,否则yum命令使用可能存在问题)
ln -s /usr/python3.6.11/bin/python3.6 /usr/bin/python3
ln -s /usr/python3.6.11/bin/pip3 /usr/bin/pip3
ln -s /usr/python3.6.11/lib/libpython3.6m.so.1.0 /usr/lib64/
export LD_LIBRARY_PATH=/usr/python3.6.11/lib:$LD_LIBRARY_PATH
手动安装需要包
安装python3的rpm包(因为有依赖关系,请按顺序执行)
3. 查看软件包安装情况
rpm -ivh python-srpm-macros-3-32.el7.noarch.rpm
rpm -ivh python-rpm-macros-3-32.el7.noarch.rpm
rpm -ivh python3-rpm-macros-3-32.el7.noarch.rpm
## 以下4个包(python3-pip/python3/python3-libs/python3-setuptools)存在互相
依赖性,必须同时安装
rpm -ivh python3-pip-9.0.3-7.el7_7.noarch.rpm python3-3.6.8-
13.el7.x86_64.rpm python3-libs-3.6.8-13.el7.x86_64.rpm python3-
setuptools-39.2.0-10.el7.noarch.rpm
rpm -ivh python3-rpm-generators-6-2.el7.noarch.rpm
rpm -ivh python3-devel-3.6.8-13.el7.x86_64.rpm
[root@db3 ~]# rpm -qa|grep python3
python3-rpm-macros-3-32.el7.noarch
python3-setuptools-39.2.0-10.el7.noarch
python3-pip-9.0.3-7.el7_7.noarch
python3-devel-3.6.8-13.el7.x86_64
python3-libs-3.6.8-13.el7.x86_64
python3-3.6.8-13.el7.x86_64
python3-rpm-generators-6-2.el7.noarch
-
环境准备
hostnamectl set-hostname gaussdb1
cat/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.138 gaussdb1
192.168.1.139 gaussdb2
192.168.1.140 gaussdb3
每台上脚本执行好,用户建好,参看01,linux 的模板
三台机器的互信一定要做好 (
后面问了老师,互信在预编译的时候会执行keyssh脚本,自己会做)
ssh gaussdb1 date
ssh gaussdb2 date
ssh gaussdb3 date
ssh 192.168.1.138 date
ssh 192.168.1.139 date
ssh 192.168.1.140 date
cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)
uanme -a
Linux node2 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux
修改版本号骗过 opengauss 检查
cat /etc/centos-release
CentOS Linux release 7.6.2003 (Core)
两个备库,解压要做好,xml配置文件都放好
-
节点文件
接下来的操作在
主服务器
上进行即可
[omm@node1 ~]$ cat /tmp/cluster.xml
<?xml version="1.1" encoding="UTF-8"?>
<ROOT>
<!-- openGauss整体信息 -->
<CLUSTER>
<!-- 数据库名称 -->
<PARAM name="clusterName" value="gscluster" />
<!-- 数据库节点名称(hostname) -->
<PARAM name="nodeNames" value="gaussdb1,gaussdb2,gaussdb3" />
<!-- 数据库安装目录-->
<PARAM name="gaussdbAppPath" value="/app/gaussdb/app" />
<!-- 日志目录-->
<PARAM name="gaussdbLogPath" value="/data/gaussdb/log" />
<!-- 临时文件目录-->
<PARAM name="tmpMppdbPath" value="/data/gaussdb/tmp" />
<!-- 数据库工具目录-->
<PARAM name="gaussdbToolPath" value="/app/gaussdb/om" />
<!-- 数据库core文件目录-->
<PARAM name="corePath" value="/data/gaussdb/corefile" />
<!-- 节点IP,与数据库节点名称列表一一对应 -->
<PARAM name="backIp1s" value="192.168.1.138,192.168.1.139,192.168.1.140"/>
</CLUSTER>
<!-- 每台服务器上的节点部署信息 -->
<DEVICELIST>
<!-- 节点1上的部署信息 -->
<DEVICE sn="1000001">
<!-- 节点1的主机名称 -->
<PARAM name="name" value="gaussdb1"/>
<!-- 节点1所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点1的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.1.138"/>
<PARAM name="sshIp1" value="192.168.1.138"/>
<!--dbnode-->
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="26000"/>
<PARAM name="dataNode1" value="/data/gaussdb/data/db1,gaussdb2,/data/gaussdb/data/db1,gaussdb3,/data/gaussdb/data/db1"/>
</DEVICE>
<!-- 节点2上的部署信息 -->
<DEVICE sn="1000002">
<!-- 节点1的主机名称 -->
<PARAM name="name" value="gaussdb2"/>
<!-- 节点1所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点1的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.1.139"/>
<PARAM name="sshIp1" value="192.168.1.139"/>
</DEVICE>
<!-- 节点3上的部署信息 -->
<DEVICE sn="1000003">
<!-- 节点1的主机名称 -->
<PARAM name="name" value="gaussdb3"/>
<!-- 节点1所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 节点1的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="192.168.1.140"/>
<PARAM name="sshIp1" value="192.168.1.140"/>
</DEVICE>
</DEVICELIST>
</ROOT>
安装前检查
*所在目录/script/gs_preinstall -U omm -G dbgrp -X /tmp/cluster.xml
这一步要是遇到sctp问题,可以忽略,这一步是分布式用的,不影响
预安装会把
部分OM的文件以及数据库应用程序 拷贝到备库 ,且会把互信做好
预编译后,reboot下
看下细节
/app/software/openGauss/script/gs_checkos -i A -h gaussdb1,gaussdb2,gaussdb3 --detail
-
开始安装
chmod 755 -R /app/software /app/gaussdb /data/gaussdb
chown omm.dbgrp -R /app/software /app/gaussdb /data/gaussdb
root 的互信最后可以删掉
su - omm
/app/software/opengauss/script/gs_install -X /app/software/opengauss/cluster.xml
安装完状态
[omm@node1 ~]$ gs_om -t status --detail
或者 gs_om -t status --all
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state |
node node_ip instance state |
node node_ip instance state
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------
1 node1 192.168.56.11 6001 /opt/huawei/install/data/db1 S Standby Need
repair(Connecting) | 2 node2 192.168.56.12 6002 /opt/huawei/install/data/db1
P Primary Normal | 3 node3 192.168.56.13 6003 /opt/huawei/install/data/db1 S
Standby Normal
6
.登录库操作
[omm@node1 script]$ gsql -d postgres -p 26000
gsql ((openGauss 1.0.0 build d85d4a6b) compiled at 2020-07-06 15:53:59 commit 0
last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
postgres=# CREATE USER sugon PASSWORD 'Sugon2020';
CREATE ROLE
postgres=# ALTER USER sugon SYSADMIN;
ALTER ROLE
postgres=# create tablespace sugondb relative location 'sugondb';
CREATE TABLESPACE
postgres=# create database sugondb with tablespace=sugondb DBCOMPATIBILITY
'A'; #
兼容
oracle
模式
CREATE DATABASE
postgres=# \q
以 sugon 用户建表
[omm@node1 script]$ gsql -U sugon -WSugon2020 -d sugondb -p 26000
gsql ((openGauss 1.0.0 build d85d4a6b) compiled at 2020-07-06 15:53:59 commit 0
last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
sugondb=> create table test(a int,comment varchar2(20));
CREATE TABLE
sugondb=> insert into test values(1,'iam new to opengauss');
INSERT 0 1
sugondb=> select * from test;
a | comment
---+----------------------
1 | iam new to opengauss
(1 row)
sugondb=> \q
-
备份集群信息,只备二进制和参数文件
[omm@node2 ~]$ gs_backup -t backup --backup-dir=/tmp/gauss --all
Parsing configuration files.
Successfully parsed the configuration file.
Performing remote backup.
Remote backup succeeded.
Successfully backed up cluster files.
[omm@node2 ~]$ ll /tmp/gauss/
total 261124
-rw------- 1 omm dbgrp 267253760 Jul 22 11:39 binary.tar
-rw------- 1 omm dbgrp 133120 Jul 22 11:39 parameter.tar
-
故障测试
1)
停掉节点
1[omm@node2 ~]$ gs_om -t stop -h node1
Stopping node.
=========================================
Successfully stopped node.
=========================================
End stop node.
[omm@node2 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state |
node node_ip instance state |
node node_ip instance state
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------
1 node1 192.168.56.11 6001 /opt/huawei/install/data/db1 P Down Manually
stopped | 2 node2 192.168.56.12 6002 /opt/huawei/install/data/db1 S Standby
Need repair(Connecting) | 3 node3 192.168.56.13 6003
/opt/huawei/install/data/db1 S Standby Need repair(Disconnected)
2
)节点
2
升主
[omm@node2 ~]$ gs_ctl failover -D /opt/huawei/install/data/db1 #彻底坏了,用switchover
[2020-07-22 11:45:39.281][2657][][gs_ctl]: gs_ctl failover ,datadir is -D
"/opt/huawei/install/data/db1"
[2020-07-22 11:45:39.281][2657][][gs_ctl]: failover term (1)
[2020-07-22 11:45:39.297][2657][][gs_ctl]: waiting for server to failover...
.[2020-07-22 11:45:40.337][2657][][gs_ctl]: done
[2020-07-22 11:45:40.337][2657][][gs_ctl]: failover completed
(/opt/huawei/install/data/db1)
[omm@node2 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state |
node node_ip instance state |
node node_ip instance state
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------
1 node1 192.168.56.11 6001 /opt/huawei/install/data/db1 P Down Manually
stopped | 2 node2 192.168.56.12 6002 /opt/huawei/install/data/db1 P Primary
Normal | 3 node3 192.168.56.13 6003 /opt/huawei/install/data/db1 S Standby
Normal
3)
节点
1
重装操作系统
,
所有的软件包与其他两节点一致
.
重新建立互信.分别删除 root,omm 的 authorized_keys 和 known_hosts
然后用 ssh-copy-id root(omm)@node{1,2,3}
建立对应的文件目录并授权.
在节点 1:
su - omm
scp node2:/tmp/gauss/* /tmp
cd /tmp
tar xvf parameter.tar
tar xvf parameter_node1.tar
gs_backup -t restore --backup-dir=/tmp/backup -h node1 --all
恢复二进制文件
cd /opt/huawei/install
scp -r node2:/opt/huawei/corefiles /opt/huawei #corepath
scp -r node2:/opt/huawei/install/om /opt/huawei/install #omtoolpath
scp -r node2:/opt/huawei/install/data /opt/huawei/install #datanode
cd /opt/huawei/install/data/db1
rm -fr postgresql.conf
rm -fr pg_ha.conf
cp /tmp/parameter_node1/6001_postgresql.conf postgresql.conf
cp /tmp/parameter_node1/600_pg_ha.conf pg_ha.conf
4)启动节点
1
到
standby
gs_ctl start -D /opt/huawei/install/data/db1 -M standby
[omm@node1 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state |
node node_ip instance state |
node node_ip instance state
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------
1 node1 192.168.56.11 6001 /opt/huawei/install/data/db1 S Standby Need
repair(Connecting) | 2 node2 192.168.56.12 6002 /opt/huawei/install/data/db1
P Primary Normal | 3 node3 192.168.56.13 6003 /opt/huawei/install/data/db1 S
Standby Normal
5
)重建 gs_ctl build -D /opt/huawei/install/data/db1
重建完成后状态
omm@node1 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state |
node node_ip instance state |
node node_ip instance state
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------
1 node1 192.168.56.11 6001 /opt/huawei/install/data/db1 S Standby Normal | 2
node2 192.168.56.12 6002 /opt/huawei/install/data/db1 P Primary Normal | 3
node3 192.168.56.13 6003 /opt/huawei/install/data/db1 S Standby Normal
6) 检查原来数据
[omm@node1 script]$ gsql -U sugon -WSugon2020 -d sugondb -p 26000
gsql ((openGauss 1.0.0 build d85d4a6b) compiled at 2020-07-06 15:53:59 commit 0
last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
sugondb=> create table test(a int,comment varchar2(20));
CREATE TABLE
sugondb=> insert into test values(1,'iam new to opengauss');
INSERT 0 1
sugondb=> select * from test;
a | comment
---+----------------------
1 | iam new to opengauss
(1 row)
sugondb=> \q
问题:
1[GAUSS-51400] : Failed to execute the command: python3 '/app/software/openGauss/script/local/PreInstallUtility.py' -t change_tool_env -u omm -l /data/gaussdb/log/omm/om/gs_local.log -X '/app/software/openGauss/cluster_config192.168.1.138
192.168.1.139
192.168.1.140
互信没做好,python3 gs_sshexkey -f ip_huxin -W 123456
ip_huxin上面写集群的ip
-W
各主机用户密码相同情况下建立互信。12345为用户密码。
su - omm 同样的操作