最近一个项目中,遇到了一个奇怪的现象,spring boot应用启动后,第一次访问页面总是会有大量的ajax请求pedding,然后刷新页面,大量的IOException
错误:
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:356)
at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:815)
at org.apache.catalina.connector.OutputBuffer.append(OutputBuffer.java:720)
at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:391)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:369)
at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:96)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2039)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator._writeBytes(UTF8JsonGenerator.java:1127)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeFieldName(UTF8JsonGenerator.java:253)
这个问题存在于服务器(CentOS 7)上,在开发者机器(Windows、Mac)上无法复现,这就比较诡异了。首先想到的是日志查看法。日志级别设为debug,查看启动整个流程,没有发现明显异常,只是看到在IOException
之前发现有连续的几行类似的日志:
[12-05 13:54:32.912| WARN|1-exec-3|o.a.catalina.util.SessionIdGeneratorBase.log:179] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [67,400] milliseconds.
[12-05 13:54:32.915| WARN|1-exec-2|o.a.catalina.util.SessionIdGeneratorBase.log:179] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [67,401] milliseconds.
[12-05 13:54:32.912| WARN|-exec-12|o.a.catalina.util.SessionIdGeneratorBase.log:179] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [7,738] milliseconds.
[12-05 13:54:32.917| WARN|1-exec-9|o.a.catalina.util.SessionIdGeneratorBase.log:179] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [7,755] milliseconds.
[12-05 13:54:32.917| WARN|1-exec-7|o.a.catalina.util.SessionIdGeneratorBase.log:179] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [7,755] milliseconds.
这明显是在生成Session ID上消耗了太长的时间。为了证明这一点,我在页面ajax hang住期间,执行jstack ${pid}
命令,显示hang住期间的内存堆栈。堆栈显示如下(有精简):
"http-nio-7001-exec-4" #47 daemon prio=5 os_prio=0 tid=0x00007f851609b000 nid=0x34c7 in Object.wait() [0x00007f84befed000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
- locked <0x00000000f517be18> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at java.security.SecureRandom.next(SecureRandom.java:491)
at java.util.Random.nextInt(Random.java:329)
at org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom(SessionIdGeneratorBase.java:269)
at org.apache.catalina.util.SessionIdGeneratorBase.getRandomBytes(SessionIdGeneratorBase.java:203)
at org.apache.catalina.util.StandardSessionIdGenerator.generateSessionId(StandardSessionIdGenerator.java:34)
at org.apache.catalina.util.SessionIdGeneratorBase.generateSessionId(SessionIdGeneratorBase.java:195)
at org.apache.catalina.session.ManagerBase.generateSessionId(ManagerBase.java:831)
at org.apache.catalina.session.ManagerBase.createSession(ManagerBase.java:663)
at org.apache.catalina.connector.Request.doGetSession(Request.java:3039)
at org.apache.catalina.connector.Request.getSession(Request.java:2429)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:896)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:908)
......
"http-nio-7001-exec-3" #46 daemon prio=5 os_prio=0 tid=0x00007f8515910800 nid=0x34c6 in Object.wait() [0x00007f84bf0ee000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
- locked <0x00000000f5096b80> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at java.security.SecureRandom.next(SecureRandom.java:491)
at java.util.Random.nextInt(Random.java:329)
at org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom(SessionIdGeneratorBase.java:269)
at org.apache.catalina.util.SessionIdGeneratorBase.getRandomBytes(SessionIdGeneratorBase.java:203)
at org.apache.catalina.util.StandardSessionIdGenerator.generateSessionId(StandardSessionIdGenerator.java:34)
at org.apache.catalina.util.SessionIdGeneratorBase.generateSessionId(SessionIdGeneratorBase.java:195)
at org.apache.catalina.session.ManagerBase.generateSessionId(ManagerBase.java:831)
at org.apache.catalina.session.ManagerBase.createSession(ManagerBase.java:663)
at org.apache.catalina.connector.Request.doGetSession(Request.java:3039)
at org.apache.catalina.connector.Request.getSession(Request.java:2429)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:896)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:908)
......
"http-nio-7001-exec-1" #44 daemon prio=5 os_prio=0 tid=0x00007f8514970800 nid=0x34c4 runnable [0x00007f84bf2ef000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:203)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
- locked <0x00000000f7173220> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at java.security.SecureRandom.next(SecureRandom.java:491)
at java.util.Random.nextInt(Random.java:329)
at org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom(SessionIdGeneratorBase.java:269)
at org.apache.catalina.util.SessionIdGeneratorBase.getRandomBytes(SessionIdGeneratorBase.java:203)
at org.apache.catalina.util.StandardSessionIdGenerator.generateSessionId(StandardSessionIdGenerator.java:34)
at org.apache.catalina.util.SessionIdGeneratorBase.generateSessionId(SessionIdGeneratorBase.java:195)
at org.apache.catalina.session.ManagerBase.generateSessionId(ManagerBase.java:831)
at org.apache.catalina.session.ManagerBase.createSession(ManagerBase.java:663)
at org.apache.catalina.connector.Request.doGetSession(Request.java:3039)
at org.apache.catalina.connector.Request.getSession(Request.java:2429)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:896)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:908)
......
大量的线程hang在sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
阶段。上搜索引擎(一定要英文的)搜索类似的内容,发现这不是一个个例,甚至JDK bug列表汇中就有相似的bug,如JDK-6521844 : SecureRandom hangs on Linux Systems,但这些bug都标记为fixed。但明显没有完全fix掉啊。继续找,找到两篇文献Avoiding JVM Delays Caused by Random Number Generation,How do I make Tomcat startup faster?,正好记录了这个随机数生成慢的原因和解决方案。
原来,Java随机数生成依赖熵源(Entropy Source),默认的阻塞型的 /dev/random
熵源可能导致阻塞,而换一个非阻塞的 /dev/urandom
的熵源就可以了。
具体操作来说,有两种方法,一种是修改Java配置文件(见Avoiding JVM Delays Caused by Random Number Generation),另一个是修改应用启动脚本。对于需要多实例各处部署的应用来说,修改启动脚本是成本最低,最可控的方案。在启动脚本中加入配置属性:-Djava.security.egd=file:/dev/./urandom
,然后启动,问题解决。需要注意的是,spring boot中,这个参数应该加在-jar
参数之前,如果加在-jar
参数之后,可能不起作用。