1 Hmaster主机观察是否有闪退
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
2021-10-31 20:57:24,325 INFO [Thread-14] procedure2.ProcedureExecutor: Starting 16 core workers (bigger of cpus/4 or 16) with max (burst) worker count=160, start 1 urgent thread(s)
2021-10-31 20:57:24,334 INFO [Thread-14] util.FSHDFSUtils: Recover lease on dfs file hdfs://master:9000/hbase/MasterProcWALs/pv2-00000000000000000009.log
2021-10-31 20:57:24,337 INFO [Thread-14] util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://master:9000/hbase/MasterProcWALs/pv2-00000000000000000009.log after 3ms
2021-10-31 20:57:24,338 WARN [Thread-14] wal.WALProcedureStore: Remove uninitialized log: FileStatus{path=hdfs://master:9000/hbase/MasterProcWALs/pv2-00000000000000000009.log; isDirectory=false; length=0; replication=1; blocksize=134217728; modification_time=1635739004822; access_time=1635739001651; owner=csu; group=supergroup; permission=rw-r--r--; isSymlink=false}
2021-10-31 20:57:24,338 INFO [Thread-14] wal.ProcedureWALFile: Archiving hdfs://master:9000/hbase/MasterProcWALs/pv2-00000000000000000009.log to hdfs://master:9000/hbase/oldWALs/pv2-00000000000000000009.log
2021-10-31 20:57:24,356 ERROR [Thread-14] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1083)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:421)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1407)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:859)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2234)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567)
at java.lang.Thread.run(Thread.java:748)
解决方法1:
hbase-site.xml增加配置 关闭安全模式
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
解决方法2:
hadoop进入安全模式了,hadoop dfsadmin -safemode leave
用这个命令离开安全模式就可以了。
2 检查zookeeper的集群中的主机名保持跟hosts文件的ip映射名一致
日志报错:
2021-10-31 17:48:24,229 INFO [main-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/192.168.99.129:2181. Will not attempt to authenticate using SASL (unknown error)
2021-10-31 17:48:24,235 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
还有报错:
2021-10-31 19:52:29,413 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts
2021-10-31 19:52:29,738 INFO [main] zookeeper.ZooKeeper: Session: 0x0 closed
2021-10-31 19:52:29,739 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
org.apache.hadoop.hbase.ZooKeeperConnectionException: master:160000x0, quorum=master:2181, baseZNode=/hbase Unexpected KeeperException creating base node
at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:158)
at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:132)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:604)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:475)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3055)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3073)
出现原因:就在于zk的协调上,当master主机向zk注册的时候,是以自身主机名来进行注册的,即master这个节点绑定的是VMxxx这个主机名,而我的Regionserver通过zk来获取master的hostname的时候,获取的是VMxxx而不是master,虽然regionserver的hosts文件中配置了master到ip的映射,在hbase-site.xml中配置了hbase.master.info.bindAddress选项为master都没用的,因为读取master根本不走配置文件二是走zk,然后在根据zk返回的结果取查询hosts中的映射。
解决方法修改主机名,并且永久生效不要重启后失效.