背景:
最近在做项目系统的异常测试,项目依赖于nkv,需要模拟依赖组件nkv异常时系统的响应及性能情况。通过tc工具模拟当服务器发送到nkv的请求超时时系统的响应。发现接口返回错误率100%,查看服务器日志报大量线程池已满,想要通过工具看下线程情况,由此简单学习了jstack。
WARN AbortPolicyWithReport - [DUBBO] Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-10.165.185.170:22021, Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200), Task: 182763 (completed: 182563), Executor status:(isShutdown:false, isTerminated:false, isTerminating:false), in dubbo://10.165.185.170:22021!, dubbo version: 2.8.4-nisp, current host: 127.0.0.1
2016-11-21 13:55:33,801 WARN AbortPolicyWithReport - [DUBBO] Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-10.165.185.170:22021, Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200), Task: 182763 (completed: 182563), Executor status:(isShutdown:false, isTerminated:false, isTerminating:false), in dubbo://10.165.185.170:22021!, dubbo version: 2.8.4-nisp, current host: 127.0.0.1
jstack
基本介绍
jstack是用于生成java虚拟机当前时刻线程快照的工具。线程快照是当前java虚拟机内每一条线程正在执行的方法堆栈的集合,生成线程快照的主要目的是定位线程出现长时间停顿的原因,如线程间死锁、死循环、请求外部资源导致的长时间等待等。线程出现停顿的时候通过jstack来查看各个线程的调用堆栈,就可以知道没有响应的线程到底在后台做什么事情,或者等待什么资源。命令语法格式
jstack <pid>
- 可以通过top查看java进程的pid
- 可通过jstack > jstack.log将快照打印至文件jstack.log
- 在实际运行中往往一次dump信息不足以确认问题,建议dump三次以便确认问题。
线程分析
- 线程示例
在线程中有一些JVM后台线程用于执行垃圾回收等任务,这些线程在JVM初始化时就存在,我们主要查看用户线程。在本次异常测试中,模拟异常后,系统响应慢大量报错,查看服务器日志显示线程池已满,通过jstack工具dump线程快照如下:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.60-b09 mixed mode):
"MySQL Statement Cancellation Timer" daemon prio=10 tid=0x00000000019dc000 nid=0x3fab in Object.wait() [0x00007f035fe36000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000c45a8680> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"pool-5-thread-257" prio=10 tid=0x00000000014e7800 nid=0x3f8a waiting on condition [0x00007f035f129000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c45e9ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"pool-5-thread-256" prio=10 tid=0x00000000013c2800 nid=0x3f89 waiting on condition [0x00007f035f42c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c45e9ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"pool-5-thread-255" prio=10 tid=0x00000000019fb000 nid=0x3f88 waiting on condition [0x00007f035f22a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c45e9ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"pool-5-thread-254" prio=10 tid=0x00000000019d0000 nid=0x3f87 waiting on condition [0x00007f035f52d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c45e9ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"pool-5-thread-253" prio=10 tid=0x00000000014ee000 nid=0x3f86 waiting on condition [0x00007f035f62e000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c45eabb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"AMQP Connection 10.165.124.207:5672" prio=10 tid=0x0000000001862800 nid=0x3f85 runnable [0x00007f0360038000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked <0x00000000c45ea970> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139)
- locked <0x00000000c45ea950> (a java.io.DataInputStream)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:536)
at java.lang.Thread.run(Thread.java:745)
"MySQL Statement Cancellation Timer" daemon prio=10 tid=0x000000000152b800 nid=0x395a in Object.wait() [0x00007f0360139000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x00000000c45f2b18> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Attach Listener" daemon prio=10 tid=0x00000000017a4000 nid=0x2168 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-8-thread-1" prio=10 tid=0x00000000022e1800 nid=0x2d0 waiting on condition [0x00007f035e71f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c05f7478> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"DubboResponseTimeoutScanTimer" daemon prio=10 tid=0x0000000002359800 nid=0x3972 waiting on condition [0x00007f035d40c000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.alibaba.dubbo.remoting.exchange.support.DefaultFuture$RemotingInvocationTimeoutScan.run(DefaultFuture.java:300)
at java.lang.Thread.run(Thread.java:745)
"pool-7-thread-1" prio=10 tid=0x00007f036ccf4800 nid=0x3825 waiting on condition [0x00007f0360b43000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c05f7218> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"DubboServerHandler-10.165.185.170:22021-thread-200" daemon prio=10 tid=0x00007f036c04e000 nid=0x381b waiting on condition [0x00007f0360c43000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ffcff858> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)
at com.netease.backend.nkv.client.rpc.net.NkvFuture.innerWait(NkvFuture.java:162)
at com.netease.backend.nkv.client.rpc.net.NkvFuture.get(NkvFuture.java:180)
at com.netease.backend.nkv.client.rpc.future.NkvResultFutureImpl.innerGet(NkvResultFutureImpl.java:144)
at com.netease.backend.nkv.client.rpc.future.NkvResultFutureImpl.get(NkvResultFutureImpl.java:81)
at com.netease.backend.nkv.client.impl.DefaultNkvClient.futureGet(DefaultNkvClient.java:530)
at com.netease.backend.nkv.client.impl.DefaultNkvClient.get(DefaultNkvClient.java:70)
at com.netease.is.ec.rpc.nkv.NkvOpService.getValue(NkvOpService.java:78)
at com.netease.is.ec.rpc.nkv.NkvOpService.setValue(NkvOpService.java:108)
at com.netease.is.ec.rpc.cacheservice.KVSqlService.setValue(KVSqlService.java:61)
at com.netease.is.ec.drag.create.CaptchaGetControl.getCaptcha(CaptchaGetControl.java:85)
at com.netease.is.ec.drag.provider.DraggingCaptchaImpl.getCaptchaImg(DraggingCaptchaImpl.java:45)
at com.alibaba.dubbo.common.bytecode.Wrapper1.invokeMethod(Wrapper1.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72)
at com.alibaba.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:53)
at com.alibaba.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:64)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:75)
- 线程状态
- Runnable
该状态表示线程具备所有运行条件,在运行队列中准备操作系统的调度,或者正在运行。 - Wait on condition
该状态出现在线程等待某个条件的发生。具体是什么原因,可以结合 stacktrace来分析。最常见的情况是线程在等待网络的读写,比如当网络数据没有准备好读时,线程处于这种等待状态,而一旦有数据准备好读之后,线程会重新激活,读取并处理数据。在 Java引入 NewIO之前,对于每个网络连接,都有一个对应的线程来处理网络的读写操作,即使没有可读写的数据,线程仍然阻塞在读写操作上,这样有可能造成资源浪费,而且给操作系统的线程调度也带来压力。在 NewIO里采用了新的机制,编写的服务器程序的性能和可扩展性都得到提高。
如果发现有大量的线程都在处在 Wait on condition,从线程 stack看, 正等待网络读写,这可能是一个网络瓶颈的征兆。因为网络阻塞导致线程无法执行。一种情况是网络非常忙,几 乎消耗了所有的带宽,仍然有大量数据等待网络读 写;另一种情况也可能是网络空闲,但由于路由等问题,导致包无法正常的到达。所以要结合系统的一些性能观察工具来综合分析,比如 netstat统计单位时间的发送包的数目,如果很明显超过了所在网络带宽的限制 ; 观察 cpu的利用率,如果系统态的 CPU时间,相对于用户态的 CPU时间比例较高;如果程序运行在 Solaris 10平台上,可以用 dtrace工具看系统调用的情况,如果观察到 read/write的系统调用的次数或者运行时间遥遥领先;这些都指向由于网络带宽所限导致的网络瓶颈。另外一种出现 Wait on condition的常见情况是该线程在 sleep,等待 sleep的时间到了时候,将被唤醒。 - Waiting for monitor entry 和 in Object.wait()
在多线程的 JAVA程序中,实现线程之间的同步,就要说说 Monitor。 Monitor是 Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象都有,也仅有一个 monitor。每个 Monitor在某个时刻,只能被一个线程拥有,该线程就是 “Active Thread”,而其它线程都是 “Waiting Thread”,分别在两个队列 “ Entry Set”和 “Wait Set”里面等候。在 “Entry Set”中等待的线程状态是 “Waiting for monitor entry”,而在 “Wait Set”中等待的线程状态是 “in Object.wait()”。
先看 “Entry Set”里面的线程。我们称被 synchronized保护起来的代码段为临界区。当一个线程申请进入临界区时,它就进入了 “Entry Set”队列。对应的 code就像:
synchronized(obj) {
.........
}
这时有两种可能性:
该 monitor不被其它线程拥有, Entry Set里面也没有其它等待线程。本线程即成为相应类或者对象的 Monitor的 Owner,执行临界区的代码
该 monitor被其它线程拥有,本线程在 Entry Set队列中等待。
在第一种情况下,线程将处于 “Runnable”的状态,而第二种情况下,线程 DUMP会显示处于 “waiting for monitor entry”。
临界区的设置,是为了保证其内部的代码执行的原子性和完整性。但是因为临界区在任何时间只允许线程串行通过,这 和我们多线程的程序的初衷是相反的。 如果在多线程的程序中,大量使用 synchronized,或者不适当的使用了它,会造成大量线程在临界区的入口等待,造成系统的性能大幅下降。如果在线程 DUMP中发现了这个情况,应该审查源码,改进程序。
现在我们再来看现在线程为什么会进入 “Wait Set”。当线程获得了 Monitor,进入了临界区之后,如果发现线程继续运行的条件没有满足,它则调用对象(一般就是被 synchronized 的对象)的 wait() 方法,放弃了 Monitor,进入 “Wait Set”队列。只有当别的线程在该对象上调用了 notify() 或者 notifyAll() , “ Wait Set”队列中线程才得到机会去竞争,但是只有一个线程获得对象的 Monitor,恢复到运行态。在 “Wait Set”中的线程, DUMP中表现为: in Object.wait()。往往在程序中,会出现多个类似的线程,他们都有相似的 DUMP信息。这也可能是正常的。比如,在程序中,有多个服务线程,设计成从一个队列里面读取请求数据。这个队列就是 lock以及 waiting on的对象。当队列为空的时候,这些线程都会在这个队列上等待,直到队列有了数据,这些线程被 Notify,当然只有一个线程获得了 lock,继续执行,而其它线程继续等待。
参考链接:
http://jameswxx.iteye.com/blog/1041173
http://flysnowxf.iteye.com/blog/1162691