nginx error.log 中出现大量如下错误信息:
[root@zayhu01-fk nginx]# grep -aP '^20.* \[crit\]' error.log
2017/03/14 12:06:31 [crit] 3549#0: accept4() failed (24: Too many open files)
[root@zayhu01-fk nginx]# grep -aP '^20.* \[alert\]' error.log
2017/03/14 16:04:27 [alert] 3551#0: *84168270 socket() failed (24: Too many open files) while connecting to upstream, client: 1.1.1.1, server:...
由于系统limits open files 限制导致以上错误,所以:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
[root@zayhu01-fk ~] # tail -4 /etc/security/limits.conf
* - nofile 500000 push - nproc 65536 push - nofile 320000 work - nproc 10000 [root@zayhu01-fk ~] # ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited
pending signals (-i) 128630 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 500000
pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real- time priority (-r) 0
stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited
max user processes (-u) 128630 virtual memory (kbytes, - v ) unlimited
file locks (-x) unlimited
[root@zayhu01-fk ~] #
|
但以上只是系统的open files 限制设置为500000;而nginx的open files 是否继承系统open files 设置还需要重新启动nginx 进程。可以通过以下命令查看:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
[root@zayhu01-fk ~] # cat /proc/`ps -ef | grep nginx|grep -v grep|head -1|awk '{print $2}'`/limits
Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 128630 128630 processes Max open files 500000 500000 files
Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks
Max pending signals 128630 128630 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0
Max realtime priority 0 0 Max realtime timeout unlimited unlimited us [root@zayhu01-fk ~] #
|
总结:ulimit -a && cat /proc/`ps -ef | grep nginx|grep -v grep| head -1 | awk '{print $2}'`/limits 必须保持一致,否则会导致大量连接失败,此时需要重启nginx 进程 。
可使用以下函数对error.log 中alert字段监控:
1
2
3
4
|
function nginx_alert_error(){
USER=$1
sudo runuser - $USER -c "[ -e /var/log/nginx/error.log ] && tail -c 50m /var/log/nginx/error.log | grep -aP '^20.*:\d{2} \[alert\]'|tail -c500k " | awk -F '[' 'BEGIN{"date -d \"-600 seconds\" \"+%Y/%m/%d %T\" " | getline cTS}{if($1>cTS) print $0 }' | grep "^20"
|
本文转自 meteor_hy 51CTO博客,原文链接:http://blog.51cto.com/caiyuanji/1906565,如需转载请自行联系原作者