今天开发找到我,说他们的数据库连接不上了,可能是连接数太多,然后我登录到服务器,并且尝试登陆数据库,也是报错:
psql: FATAL: sorry, too many clients already
很明显,是数据库连接满了。于是查看一下数据库连接进程:
[postgres@ec2s-autodenalicontentpoi-01 ~]$ ps -ef |grep postgres
postgres 3406 18212 0 00:35 ? 00:01:00 postgres: denaliadmin region_na 172.16.60.16(51976) idle
postgres 4221 18212 0 01:09 ? 00:00:03 postgres: denaliadmin region_anz 10.66.40.44(61006) idle
postgres 4223 18212 0 01:09 ? 00:00:00 postgres: denaliadmin region_anz 10.66.40.44(61009) idle
postgres 4390 18212 0 01:16 ? 00:00:00 postgres: denaliadmin region_sa 10.66.40.46(63779) idle
postgres 4391 18212 0 01:16 ? 00:00:00 postgres: denaliadmin region_sa 10.66.40.46(63784) idle
postgres 5587 18212 0 02:04 ? 00:00:00 postgres: denaliadmin postgres 172.16.60.16(53018) idle
postgres 5782 18212 2 02:13 ? 00:01:29 postgres: denaliadmin region_sa 10.189.101.98(40704) idle
postgres 5793 18212 1 02:13 ? 00:01:06 postgres: denaliadmin region_sa 10.189.101.98(40705) idle
postgres 5794 18212 1 02:13 ? 00:01:10 postgres: denaliadmin region_sa 10.189.101.98(40706) idle
......
为了能够登录数据库,只有kill掉一些处于idle状态的进程,再使用超级用户登录
$ kill 4223
然后就可以进到数据库中通过select * from pg_stat_activity where state='idle';来查到哪些进程处于空闲,然后批量kill.
下面个人总结了一些关于PostgreSQL的连接控制:
max_connections
#数据库最大连接数
superuser_reserved_connections
#数据库预留给超级用户的连接数
Note:
如果max_connections=8,superuser_reserved_connections=3,
前面5次无论我使用什么用户登录都算普通用户登录次数,比如我先用超级用户postgres连续登陆5次,保持连接,第6次用普通用户是无法登陆,但是用超级用户是可以登录的。
测试过程
#设置参数大小
postgres=# show max_connections ;
max_connections
-----------------
8
postgres=# show superuser_reserved_connections;
superuser_reserved_connections
--------------------------------
3
#使用普通用户cdhu1和cdhu2连接数据库,连续开多个会话,查看连接数正好5个
testdb1=> select datid,datname,pid,usesysid,usename,application_name,client_addr,client_port,state,query from pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_port | state | query
-------+---------+-------+----------+---------+------------------+-----------------+-------------+--------+------------------------------------------------------------------------------------------------------------------------
16615 | testdb1 | 60240 | 16642 | cdhu2 | psql | | | | <insufficient privilege>
16615 | testdb1 | 60165 | 16638 | cdhu1 | psql | 192.168.163.102 | 58292 | active | select datid,datname,pid,usesysid,usename,application_name,client_addr,client_port,state,query from pg_stat_activity;
16615 | testdb1 | 60180 | 16638 | cdhu1 | psql | 192.168.163.102 | 58293 | idle | select current_database();
16615 | testdb1 | 60194 | 16638 | cdhu1 | psql | 192.168.163.102 | 58294 | idle | select current_database();
16615 | testdb1 | 60196 | 16642 | cdhu2 | psql | | | | <insufficient privilege>
#当再次使用普通用户连接数据库的时候报错,说明最多可以使用5个普通用户连接数据库,保留三个超级用户连接:
Darren2:postgres:/usr/local/pgsql/data:>psql -U cdhu2 -d testdb1 -h 192.168.163.101
Password for user cdhu2:
psql: FATAL: remaining connection slots are reserved for non-replication superuser connections
#当用超级用户postgres可以连接,并且最多只能再连接3个超级用户了
Darren1:postgres:/usr/local/pgsql/data:>psql -h192.168.163.101 -Upostgres -d postgres
postgres=# select datid,datname,pid,usesysid,usename,application_name,client_addr,client_port,state,query from pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_port | state | query
-------+----------+-------+----------+----------+------------------+-----------------+-------------+--------+------------------------------------------------------------------------------------------------------------------------
16615 | testdb1 | 60240 | 16642 | cdhu2 | psql | 192.168.163.102 | 58299 | idle |
16615 | testdb1 | 60165 | 16638 | cdhu1 | psql | 192.168.163.102 | 58292 | idle | select current_user;
16615 | testdb1 | 60180 | 16638 | cdhu1 | psql | 192.168.163.102 | 58293 | idle | select current_database();
16615 | testdb1 | 60194 | 16638 | cdhu1 | psql | 192.168.163.102 | 58294 | idle | select current_database();
16615 | testdb1 | 60196 | 16642 | cdhu2 | psql | 192.168.163.102 | 58295 | idle | select current_database();
13269 | postgres | 60467 | 10 | postgres | psql | 192.168.163.101 | 53674 | active | select datid,datname,pid,usesysid,usename,application_name,client_addr,client_port,state,query from pg_stat_activity;
#如果连接全部打满,无论使用什么用户都连接不上,并报错
Darren2:postgres:/usr/local/pgsql/data:>psql -U postgres -d testdb1 -h 192.168.163.101
psql: FATAL: sorry, too many clients already
#可以从系统层面看到连接,共8个连接,PostgreSQL的每个会话连接对应每个系统进程
Darren1:postgres:/usr/local/pgsql:>ps -ef|grep postgres
......
postgres 60165 60127 0 18:53 ? 00:00:00 postgres: cdhu1 testdb1 192.168.163.102(58292) idle
postgres 60180 60127 0 18:53 ? 00:00:00 postgres: cdhu1 testdb1 192.168.163.102(58293) idle in transaction
postgres 60194 60127 0 18:54 ? 00:00:00 postgres: cdhu1 testdb1 192.168.163.102(58294) idle
postgres 60196 60127 0 18:54 ? 00:00:00 postgres: cdhu2 testdb1 192.168.163.102(58295) idle
postgres 60240 60127 0 18:55 ? 00:00:00 postgres: cdhu2 testdb1 192.168.163.102(58299) idle
postgres 60467 60127 0 19:00 ? 00:00:00 postgres: postgres postgres 192.168.163.101(53674) idle
postgres 60568 60127 0 19:02 ? 00:00:00 postgres: postgres postgres [local] idle
postgres 60583 60127 0 19:02 ? 00:00:00 postgres: postgres postgres [local] idle
当连接打满了,超级用户也无法登陆数据的时候怎么办?
(1)在系统层面kill其中一个idle进程,然后使用超级用户登录可以使用pg_terminate_backend(pid)断开连接
Darren1:postgres:/usr/local/pgsql:>kill 60467
postgres=# select pg_terminate_backend(61825);
pg_terminate_backend
----------------------
t
(2)在系统层面kill -9其中一个进程时,全部的连接都会断开,所以慎重使用
Darren1:postgres:/usr/local/pgsql:>kill -9 60240
postgres=# select datid,datname,pid,usesysid,usename,application_name,client_addr,client_port,state,query from pg_stat_activity;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
(3)重启数据库
Darren1:postgres:/usr/local/pgsql:>pg_ctl restart