若数据库出现连ORA-00020连接数满,会导致业务连接不上进而导致业务中断,若本身数据库的连接数上限设置的不高,可以通过更改processes参数来处理(aler system set processes=xxx scope=spfile),该参数为静态参数,需要重启数据库生效。若本身数据库连接数设置的就非常高了,而是由于业务问题导致部分主机异常超高连接,则需要定位主机、重启主机、查杀会话等方式处理,这里介绍下若是后者的原因的处理思路。
- 检查alert日志
ORA-00020: maximum number of processes (4500) exceeded
ORA-20 errors will not be written to the alert log for
the next minute. Please look at trace files to see all
the ORA-20 errors.
Fri Nov 05 09:35:48 2021
Thread 1 advanced to log sequence 191580 (LGWR switch)
Current log# 1 seq# 191580 mem# 0: +SSD/hospital/onlinelog/redo01.log
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_4
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_3
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_2
Fri Nov 05 09:35:48 2021
Archived Log entry 904316 added for thread 1 sequence 191579 ID 0x431aef8a dest 1:
Fri Nov 05 09:37:30 2021
ORA-00020: maximum number of processes (4500) exceeded
ORA-20 errors will not be written to the alert log for
the next minute. Please look at trace files to see all
the ORA-20 errors.
- 查看数据库当前连接数
SQL> show parameter process;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 1
cell_offload_processing boolean TRUE
db_writer_processes integer 10
gcs_server_processes integer 4
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 4500
processor_group_name string
SQL> select inst_id,count(*) from gv$session group by inst_id;
INST_ID COUNT(*)
---------- ----------
1 4496
2 715
- 定位问题主机
SQL> select program,machine,count(*)
2 from gv$session group by program,machine,sql_id order by 3;
…
PROGRAM MACHINE COUNT(*)
-------------------------------- -------------------------- ----------
A.exe WORKGROUP\SRSERBER 10
B.exe WORKGROUP\QDQ 10
C.exe WORKGROUP\DQ 10
D.exe WORKGROUP\DQDR 10
E.exe WORKGROUP\DGQHU 10
...
dqgeq.exe WORKGROUP\234 3845
- 根据主机名和程序名称定位主机IP
select b.spid from v$session a,v$process b where a.paddr=b.addr and a.program='程序名' and a.machine like '%主机名%';
netstat -anp|grep OSPID1
netstat -anp|grep OSPID2
netstat -anp|grep OSPID3
测试多个OSPID确保IP一致,将获取到的IP提供给客户或业务部门,重启该主机上的应用程序或直接重启该主机。
- 查杀会话
重启主机或应用程序可能时间较长,此时可以通过kill ospid批量杀除会话的方法,先杀除一部分会话,保证业务可用。
select b.spid from v$session a,v$process b where a.paddr=b.addr and a.program='程序名' and a.machine like '%主机名%';
kill -9 OSPID