一、问题现象
基于SpringBoot的jar包运行的时间会比较长,在运行过程中,进程hung在那里,不再有日志输出,数据库也并没有一直在执行的SQL任务。
二、问题排查
使用Jstack导出java的线程信息如下:
2021-02-22 18:46:38
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode):
"Attach Listener" #99 daemon prio=9 os_prio=0 tid=0x00007f4478001000 nid=0x18f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"HikariPool-1 housekeeper" #24 daemon prio=5 os_prio=0 tid=0x00007f451e5b8000 nid=0x98 waiting for monitor entry [0x00007f449481b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:360)
- waiting to lock <0x000000008031aea8> (a org.apache.logging.log4j.core.appender.OutputStreamManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:220)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:58)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:177)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2163)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2118)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2101)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2006)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:134)
at com.zaxxer.hikari.pool.HikariPool.logPoolState(HikariPool.java:404)
at com.zaxxer.hikari.pool.HikariPool$HouseKeeper.run(HikariPool.java:776)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Abandoned connection cleanup thread" #22 daemon prio=5 os_prio=0 tid=0x00007f451c93e800 nid=0x97 in Object.wait() [0x00007f4495bb1000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000816e8be0> (a java.lang.ref.ReferenceQueue$Lock)
at com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #17 daemon prio=9 os_prio=0 tid=0x00007f451c2ca800 nid=0x93 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00007f451c2c7800 nid=0x92 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00007f451c2c6000 nid=0x91 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00007f451c2c4000 nid=0x90 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00007f451c2c2000 nid=0x8f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00007f451c2c0000 nid=0x8e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00007f451c2be000 nid=0x8d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00007f451c2bc000 nid=0x8c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00007f451c2ba000 nid=0x8b waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f451c2b8000 nid=0x8a waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f451c2b6000 nid=0x89 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f451c2b4000 nid=0x88 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f451c2b1000 nid=0x87 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f451c2af800 nid=0x86 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f451c27d800 nid=0x85 in Object.wait() [0x00007f44973f2000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000802ca438> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f451c278800 nid=0x84 in Object.wait() [0x00007f44974f3000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000000802ca668> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007f451c009800 nid=0x75 runnable [0x00007f4523eb2000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000802ba940> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000802ba920> (a java.io.PrintStream)
at org.apache.logging.log4j.core.appender.ConsoleAppender$SystemOutStream.write(ConsoleAppender.java:338)
at java.io.PrintStream.write(PrintStream.java:480)
- locked <0x000000008031cf78> (a java.io.PrintStream)
at org.apache.logging.log4j.core.util.CloseShieldOutputStream.write(CloseShieldOutputStream.java:53)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:262)
- eliminated <0x000000008031aea8> (a org.apache.logging.log4j.core.appender.OutputStreamManager)
at org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager.java:294)
- eliminated <0x000000008031aea8> (a org.apache.logging.log4j.core.appender.OutputStreamManager)
at org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:303)
- locked <0x000000008031aea8> (a org.apache.logging.log4j.core.appender.OutputStreamManager)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:179)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2163)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2118)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2101)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2000)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1859)
at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:119)
at com.ll.hiws.warning.service.impl.WarningResultServiceImpl.InfectCalculateFromResult(WarningResultServiceImpl.java:410)
at com.ll.hiws.warning.service.impl.WarningResultServiceImpl.warningCalculate(WarningResultServiceImpl.java:179)
at com.ll.hiws.warning.service.impl.WarningResultServiceImpl.InfectCalculate(WarningResultServiceImpl.java:103)
at com.ll.hiws.warning.service.impl.WarningResultServiceImpl$$FastClassBySpringCGLIB$$3f8499af.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:747)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.transaction.interceptor.TransactionInterceptor$$Lambda$165/1989184704.proceedWithInvocation(Unknown Source)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:294)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)
at com.ll.hiws.warning.service.impl.WarningResultServiceImpl$$EnhancerBySpringCGLIB$$57158dfd.InfectCalculate(<generated>)
at com.ll.hiws.HiwsApplicationRunner.run(HiwsApplicationRunner.java:70)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:781)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:771)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:335)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1246)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1234)
at com.ll.hiws.HiwsWarningApplication.main(HiwsWarningApplication.java:11)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
"VM Thread" os_prio=0 tid=0x00007f451c271000 nid=0x83 runnable
首先可以看到:springboot使用的数据库连接池HikariPool线程处于BLOCKED状态,等待锁 lock <0x000000008031aea8> 释放,
是log4j的输出管理器持有这个锁,
之后的信息也可以看到,日志输出管理器,加了这个锁
main程序持有了锁,数据库连接池等待锁,main程序输出完之后把 锁释放了,数据库连接池就能接着运行了,一切都看起来很正常,而且main程序是RUNNABLE,处于可运行状态,在等待操作系统的资源,没毛病,一切都看起来很正常。
问题就出在,过了会我又导出了一个线程信息,发现一模一样,程序一直没往下执行啊!所以就是main程序的writeBytes一直在等待?!
在jira上找到这样一个回答:ConsoleAppender hangs when writing to System.out in a spawned JVM
有点底层,没整明白,大概意思是 要把ConsoleAppender默认为false的follow设置为true,就解决问题了。
后来我整理了好多资料,这里大概整理下我的理解:
记录日志的时候,如果往控制台打印输出日志的话,会把日志写入缓存,控制台会从缓存中取,但比如控制台没取,比如在IDEA中运行的时候,用鼠标选中控制台,这时候控制台会暂停输出,不从缓存中取东西,缓存内容就不会清,日志程序
会一直往里写,直到写满,线程就会停止写入,等待缓存可用,表现在程序里,就是writeBytes函数不返回,持有的锁不释放,程序就hung住了。
部署在docker中的时候,因为docker容器会一直获取标准输出的内容,自己记录docker日志,但是当缓存中的东西比较多的时候,比如日志长度特别长,docker没办法及时清空缓存,也会导致log4j出现这个问题。但是据我找到的资料,这个问题可以通过升级DOCKER版本得到解决。
等等,往控制台输出的日志会出现这个问题,要是不往控制台输出日志呢?
所以我这里整理下解决方案:
1、在使用 ConsoleAppender 的时候把follow属性设置为true
2、(笔者未验证)改用别的组件,在log4j2框架下,额外引入disruptor,参考:博客园-log4j输出到控制台的性能问题
3、运行时,配置log4j不往控制台输出日志,都写入到文件中,笔者程序正常运行了
4、(笔者未验证)升级docker版本至 18.06 参考:Logging long lines breaks container
这里列一下笔者找到的相关连接:
1、ConsoleAppender hangs when writing to System.out in a spawned JVM
2、OutputStreamManager in ConsoleAppender leaking managers
5、Logging long lines breaks container
6、A RUNNABLE state thread hang s on the java.io.FileOutputStream.writeBytes method