KingbaseES启动数据库失败后如何分析

 

关键字:

   KingbaseES、sys_ctl、启动日志

一、KingbaseES数据库服务启动

1.1 数据库启动机制

1) 数据库通过sys_ctl工具手工启动数据库服务kingbase。

2) 对于sys_ctl工具需要通过-D参数指定数据库数据存储路径。

3) 数据库启动需要读取kingbase.conf文件,获取数据库实例初始化的参数配置。

4) 数据库启动时产生的日志信息可以写入到指定的日志文件或显示在标准输出上。

5) 可以通过数据库启动日志来判断、分析数据库启动的故障原因。

1.2 数据库服务启动工具sys_ctl

 KingbaseES启动数据库失败后如何分析

 

 

图1-1 sys_ctl工具帮助信息

 

二、数据库服务启动故障分析

2.1 数据库启动端口被占用案例

案例说明:

数据库在启动时,日志信息提示“could not bind IPv4 address "0.0.0.0": Address already in use“,查看数据库服务端口(default:54321),此端口在系统下处于”Listen“状态,已经被其他数据库服务占用。如果在主机上启动多个数据库实例,需要修改port,避免实例之间的数据库服务端口冲突。

 

故障现象:

[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_021/Server/bin/sys_ctl start -D /data/kingbase/v8r6_021/data

 

waiting for server to start....2021-03-01 12:52:31.989 CST [15825] LOG:  sepapower extension initialized

2021-03-01 12:52:31.991 CST [15825] LOG:  starting KingbaseES V008R006C004B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

2021-03-01 12:52:31.991 CST [15825] LOG:  could not bind IPv4 address "0.0.0.0": Address already in use

2021-03-01 12:52:31.991 CST [15825] HINT:  Is another kingbase already running on port 54321? If not, wait a few seconds and retry.

2021-03-01 12:52:31.991 CST [15825] LOG:  could not bind IPv6 address "::": Address already in use

2021-03-01 12:52:31.991 CST [15825] HINT:  Is another kingbase already running on port 54321? If not, wait a few seconds and retry.

2021-03-01 12:52:31.991 CST [15825] WARNING:  could not create listen socket for "*"

2021-03-01 12:52:31.991 CST [15825] FATAL:  could not create any TCP/IP sockets

2021-03-01 12:52:31.991 CST [15825] LOG:  database system is shut down

 stopped waiting

sys_ctl: could not start server

Examine the log output.

 

故障分析:

查看端口(54321)使用情况,可以获知54321端口已经被占用:

 

[kingbase@node1 data]$ netstat -antlp|grep -i listen|grep :54321

 

tcp        0      0 0.0.0.0:54321           0.0.0.0:*               LISTEN      14665/kingbase     

tcp6       0      0 :::54321                :::*                    LISTEN      14665/kingbase  

 

查看数据库服务相关进程:  

[kingbase@node1 data]$ ps -ef |grep 14665

 

kingbase 14665     1  0 12:51 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data

kingbase 14669 14665  0 12:51 ?        00:00:00 kingbase: logger  

kingbase 14671 14665  0 12:51 ?        00:00:00 kingbase: startup   recovering 000000070000000200000086

kingbase 14672 14665  0 12:51 ?        00:00:00 kingbase: checkpointer  

kingbase 14673 14665  0 12:51 ?        00:00:00 kingbase: background writer  

kingbase 14674 14665  0 12:51 ?        00:00:00 kingbase: stats collector  

kingbase 14676 14665  0 12:51 ?        00:00:02 kingbase: walreceiver   streaming 2/860023B0

kingbase 15088 14665  0 12:52 ?        00:00:01 kingbase: esrep esrep 192.168.7.248(26056) idle

kingbase 15769 14665  0 12:52 ?        00:00:00 kingbase: system test ::1(26355) idle

 

故障解决:

修改数据库服务端口号:                   

 [kingbase@node1 data]$ cat kingbase.conf |grep port

port = 54322                            # (change requires restart)

 

 

2.2 数据库启动内存分配错误案例

案例说明:

   数据库实例在启动时,日志信息提示“could not map anonymous shared memory: Cannot allocate memory“。数据库服务无法获取buffer分配,导致实例启动失败。通过重新配置内核,增加共享内存的尺寸或者缩小数据库共享buffer大小(shared_buffer)来解决问题。

 

故障现象:

[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_021/Server/bin/sys_ctl start -D /data/kingbase/v8r6_021/data

 

waiting for server to start....2021-03-01 13:01:46.176 CST [20183] LOG:  sepapower extension initialized

2021-03-01 13:01:46.179 CST [20183] LOG:  starting KingbaseES V008R006C004B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

2021-03-01 13:01:46.179 CST [20183] LOG:  listening on IPv4 address "0.0.0.0", port 54322

2021-03-01 13:01:46.179 CST [20183] LOG:  listening on IPv6 address "::", port 54322

2021-03-01 13:01:46.316 CST [20183] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54322"

2021-03-01 13:01:46.383 CST [20183] FATAL:  could not map anonymous shared memory: Cannot allocate memory

2021-03-01 13:01:46.383 CST [20183] HINT:  This error usually means that Kingbase's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 8850808832 bytes), reduce Kingbase's shared memory usage, perhaps by reducing shared_buffers or max_connections.

2021-03-01 13:01:46.383 CST [20183] LOG:  database system is shut down

 stopped waiting

sys_ctl: could not start server

Examine the log output.

 

故障分析:

 

查看kingbase.conf文件中buffer的配置参数:

[kingbase@node1 data]$ cat kingbase.conf |grep buffer

shared_buffers = 8192MB                 # min 128kB

 

查看系统内存使用情况:

[kingbase@node1 data]$ free -m

              total        used        free      shared  buff/cache   available

Mem:           3381         435        2060          70         885        1833

Swap:          2815           0        2815

 

===kingbase.conf文件中查看buffer配置(8192M),已经超出了系统物理内存和swap分区的总和(3381+2815 M),导致数据库实例无法获取到指定的buffer,从而导致实例启动失败。===

 

故障解决:

 

修改kingbase.conf文件调整buffer的大小:

[kingbase@node1 data]$ cat kingbase.conf |grep -i shared_buffer

shared_buffers = 1024MB                 # min 128kBM

 

三、总结

对于数据库服务启动的故障,可以根据启动的日志信息进行分析、判断所产生的故障原因;一般数据库服务启动的故障,大部分和数据库的配置(kingbase.conf)参数有关,所以在分析、解决问题时,可以结合配置文件参数的配置和系统环境配置进行处理。

参考文档:

[安装与升级]基于Linux系统的数据库软件安装指南(单机版)]

上一篇:@Conditional注解分析,SpringBoot自动化配置的关键


下一篇:Kubernetes Conditions