最近一直在捣鼓HBase的项目,之前写了一些代码从数据库加载数据到HBase,所有的代码都跑得好好地,然而今天尝试着换了一个数据库,就跑不通了。通过数据工具,可以发现连接没有问题,而且有部分逻辑很顺利通过了,然而有一些就是卡主了,通过jstack打印出来的信息可以找到这样的堆栈:
"runner{object-loader#292}-objecthandler" #323 prio=5 os_prio=0 tid=0x00002aaadc5ec800 nid=0x7f62 in Object.wait() [0x0000000056ce4000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1104)
- locked <0x00000007736013e8> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
所以开始我怀疑是连接的问题,从网上也找到了一个类型的现象,有人怀疑是DBPC的一个bug导致死锁:http://*.com/questions/5714511/deadlock-issue-in-dbcp-deployed-on-tomcat,所以我升级了DBCP版本1.4,然而和这人一样的结果,升级DBCP版本并没有解决问题。简单的看DBCP的代码,都开始怀疑是不是因为没有Spring JdbcTemplate没有正确的把Connection返回回去引起泄漏了,然而也有点感觉不太可能,因为这段代码在其他数据库都跑得好好地,但是我们的数据库版本都是一致的,然而其他配置上也被假设一致了(被忽略的一个重要的点)。java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1104)
- locked <0x00000007736013e8> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
后来开始调配置,减少连接数,减少线程数,经过各种组合,发现当把DB读的batch降到1的时候就可以work了,非常诡异的一个问题。从数据工具中查到,如果用batch,得到的SQL是:
SELECT <column>, <column> FROM <table> where iid in (@p0, @p1)
如果是batch是1的话:SELECT <column>, <column> FROM <table> where iid in (@p0)
这段SQL语句是这么产生的:
DataSource dataSource = ....
this.jdbc = new NamedParameterJdbcTemplate(dataSource);
...
....
})
this.jdbc = new NamedParameterJdbcTemplate(dataSource);
...
MapSqlParameterSource parameters = new MapSqlParameterSource();
parameters.addValue("params", paramsMap.keySet());
jdbc.query("SELECT <columns> FROM <table> where <column> in (:params)";, parameters, new ResultSetExtractor<Void>() {....
})
如果是一个batch的话,在jstack堆栈中可以看到它一直在等数据库的返回结果:
"runner{object-loader#16}-objecthandler" #47 prio=5 os_prio=0 tid=0x0000000006ddd800 nid=0x2694 runnable [0x0000000045434000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.sybase.jdbc3.timedio.RawDbio.reallyRead(Unknown Source)
at com.sybase.jdbc3.timedio.Dbio.doRead(Unknown Source)
at com.sybase.jdbc3.timedio.InStreamMgr.a(Unknown Source)
at com.sybase.jdbc3.timedio.InStreamMgr.doRead(Unknown Source)
at com.sybase.jdbc3.tds.TdsProtocolContext.getChunk(Unknown Source)
这也解释了第一个堆栈一直停在borrowObject(getConnection)的阶段,因为之前所有的Connection都在数据库堵住没有返回,所以这个线程再拿Connection的时候超过了我设置的最大Connection数,所以就等着拿不到Connection。java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.sybase.jdbc3.timedio.RawDbio.reallyRead(Unknown Source)
at com.sybase.jdbc3.timedio.Dbio.doRead(Unknown Source)
at com.sybase.jdbc3.timedio.InStreamMgr.a(Unknown Source)
at com.sybase.jdbc3.timedio.InStreamMgr.doRead(Unknown Source)
at com.sybase.jdbc3.tds.TdsProtocolContext.getChunk(Unknown Source)
在后来查了一下不同数据库的JDBC Driver信息(sp_version):
jConnect (TM) for JDBC(TM)/7.07 ESD #4 (Build 26793)/P/EBF20302/JDK 1.6.0/jdbcmain/OPT/Thu Jul 5 22:08:44 PDT 2012
jConnect (TM) for JDBC(TM)/1000/Wed Mar 11 05:01:24 2015 PDT
jConnect (TM) for JDBC(TM)/1000/Wed Mar 11 05:01:24 2015 PDT
也就是说这种用法是因为旧的JDBC Driver对NamedParameterJdbcTemplate不完善引起的,这个坑花了我一整天的时间。。。。