今天早上集群空间满了,移动Ambari Metrics Collector日志目录后服务启动不了,日志如下:
java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-07-30 11:46:23,686 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node2.bigdata/192.168.0.129:61181. Will not attempt to authenticate using SASL (unknown error) 2018-07-30 11:46:23,686 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-07-30 11:46:23,788 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node2.bigdata:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server 2018-07-30 11:46:23,788 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts 2018-07-30 11:46:23,788 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x72cc7e6f0x0, quorum=node2.bigdata:61181, baseZNode=/ams-hbase-unsecure Unable to get data of znode /ams-hbase-unsecure/meta-region-server org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:624) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:611) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:592) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:565) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1209) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1176) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:340) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:159) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:799) at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602) at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366) at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:415) at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2363)
最后找到解决办法:
https://blog.csdn.net/zhouyuanlinli/article/details/79552347
把保存监控数据的/var/lib/ambari-metrics-collector/目录下的三个文件夹checkpoint 、hbase和hbase-tmp删了(可以先备份),重启Metrics Collector就ok了。
rm -rf /var/lib/ambari-metrics-collector/*