从零开始搭建个人大数据集群(5)——HBASE安装

从零开始搭建个人大数据集群——环境准备篇
从零开始搭建个人大数据集群(1)——zookeeper
从零开始搭建个人大数据集群(2)——HDFS
从零开始搭建个人大数据集群(3)——YARN
从零开始搭建个人大数据集群(4)——HIVE

安装前的准备

1.安装好zookeeper和hadoop
2.准备好hbase-2.3.5-bin.tar.gz

解压安装包

cd /opt/packages
tar -zxf hbase-2.3.5-bin.tar.gz  -C ../apps
ln -s hbase-2.3.5-bin hbase

配置hbase

cd /opt/apps/hbase/conf

链接hadoop的配置文件

ln -s /opt/apps/hadoop/etc/hadoop/core-site.xml core-site.xml
ln -s /opt/apps/hadoop/etc/hadoop/hdfs-site.xml hdfs-site.xml

regionservers

在此文件里配置哪些机器运行regionserver

[hadoop@hd1 conf]$ cat regionservers
hd2
hd3
hd4
hd5

hbase-env.sh

修改以下内容

export JAVA_HOME=/usr/local/jdk/
export HBASE_MANAGES_ZK=false

hbase-site.xml

<configuration>
<!--    Cluster -->
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://ns1/hbase</value>
                <description>region server的共享目录,用来持久化HBase</description>
        </property>

        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
                <description>HBase的运行模式。false是单机模式,true是分布式模式</description>
        </property>

        <property>
                <name>hbase.tmp.dir</name>
                <value>/data/hbase</value>
                <description>本地文件系统的临时文件夹。</description>
        </property>

        <property>
                <name>hfile.block.cache.size</name>
                <value>0.39</value>
                <description>storefile的读缓存占用Heap的大小百分比,当然是越大越好,如果读比写多,开到0.4-0.5也没问题。如果读写较均衡,0.3左右。如果写比读多,果断默认吧。</description>
        </property>

        <property>
                <name>hbase.rpc.timeout</name>
                <value>900000</value>
                <description>hbase client中的rpc请求超时时间</description>
        </property>
        <!--    Master  -->
        <property>
                <name>hbase.master</name>
                <value>hd1:60000</value>
        </property>

        <property>
                <name>hbase.master.info.port</name>
                <value>60010</value>
                <description>HBase Master web 界面端口. 设置为-1 意味着你不想让他运行。</description>
        </property>
        <!--    regionserver    -->
        <property>
                <name>hbase.regionserver.port</name>
                <value>60020</value>
                <description>HBase RegionServer绑定的端口</description>
        </property>

        <property>
                <name>hbase.regionserver.info.port</name>
                <value>60030</value>
                <description>HBase RegionServer web 界面绑定的端口 设置为 -1 意味这你不想与运行 RegionServer 界面</description>
        </property>

        <property>
                <name>hbase.regionserver.lease.period</name>
                <value>180000</value>
                <description>客户端租用HRegion server 期限,即超时阀值。单位是毫秒。默认情况下,客户端必须在这个时间内发一条信息,否则视为死掉。</description>
        </property>

        <property>
                <name>hbase.regionserver.restart.on.zk.expire</name>
                <value>true</value>
                <description>遇到ZooKeeper session expired(过期), regionserver将选择 restart 而不是 abort(终止)</description>
        </property>

        <property>
                <name>hbase.regionserver.handler.count</name>
                <value>100</value>
                <description>RegionServers处理远程请求的线程数,如果注重TPS(每秒事务数),可以调大,默认10。
                1)值设得越大,意味着内存开销变大;
                2)对于提高write的速度,如果瓶颈在做flush、compact、split的速度,磁盘io跟不上,提高线程数,意义不大。</description>
        </property>

        <property>
                <name>hbase.regionserver.codecs</name>
                <value>snappy,gz</value>
                <description></description>
        </property>

        <property>
                <name>hbase.hregion.memstore.block.multiplier</name>
                <value>2</value>
                <description>regionserver在写入时会检查每个region对应的memstore的总大小是否超过了memstore默认大小的2倍(hbase.hregion.memstore.block.multiplier决定),如果超过了则锁住memstore不让新写请求进来并触发flush,避免产生OOM。</description>
        </property>

        <property>
                <name>hbase.hregion.max.filesize</name>
                <value>256000000</value>
                <description>在当前ReigonServer上单个Reigon的最大存储空间,单个Region超过该值时,这个Region会被自动split成更小的region。</description>
        </property>
        <!--    Client参数      -->
        <property>
                <name>hbase.client.scanner.caching</name>
                <value>10000</value>
                <description>客户端参数,HBase scanner一次从服务端抓取的数据条数</description>
        </property>

        <property>
                <name>hbase.client.scanner.timeout.period</name>
                <value>900000</value>
                <description>scanner过期时间</description>
        </property>

        <!--    Zookeeper       -->
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>hd1:2181,hd2:2181,hd3:2181</value>
        </property>

        <property>
                <name>zookeeper.session.timeout</name>
                <value>1200000</value>
        <description>RegionServer与Zookeeper间的连接超时时间。当超时时间到后,ReigonServer会被Zookeeper从RS集群清单中移除,HMaster收到移除通知后,会对这台server负责的regions重新balance,让其他存活的RegionServer接管</description>
        </property>

        <property>
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/data/zookeeper</value>
                <description>ZooKeeper的zoo.conf中的配置。 快照的存储位置</description>
        </property>
</configuration>

snappy

我们在hbase.regionserver.codecs中配置了有snappy,但HBASE2.3.5中没有这个snappy的包,需要从hadoop包里获取,否则在运行start-hbase.sh的时候HRegionServer无法启动。会报如下错误:

2021-06-04 21:01:36,719 WARN  [main] util.CompressionTest: Can't instantiate codec: snappy
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:103)
        at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:69)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.checkCodecs(HRegionServer.java:835)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:574)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3096)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3114)
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:136)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)
        at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getCompressor(Compression.java:356)
        at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:98)
        ... 13 more
2021-06-04 21:01:36,723 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.io.IOException: Compression codec snappy not supported, aborting RS construction

解决方法如下:

cd /opt/apps/hbase/lib
mkdir native
cd native
ln -s /opt/apps/hadoop/lib/native Linux-amd64-64

配置环境变量

到目前为止的环境变量配置

JAVA_HOME=/usr/local/jdk
ZOOKEEPER_HOME=/opt/apps/zookeeper
HADOOP_HOME=/opt/apps/hadoop
HADOOP_COMMON_HOME=${HADOOP_HOME}
HADOOP_HDFS_HOME=${HADOOP_HOME}
HADOOP_MAPRED_HOME=${HADOOP_HOME}
HADOOP_YARN_HOME=${HADOOP_HOME}
HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:/usr/lib64

HBASE_HOME=/opt/apps/hbase
HBASE_LIBRARY_PATH=${HBASE_HOME}/lib/native/Linux-amd64-64
HIVE_HOME=/opt/apps/hive
HIVE_CONF_DIR=${HIVE_HOME}/conf
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

分发软件包

在hd1上用批量分发脚本将整个hbase分发到其他机器

# 分发整个包
rsall hbase-2.3.5
# 分发软链接
rsall hbase

启动hbase

首先请自行启动zookeeper和hdfs
然后批量启动hbase服务

[hadoop@hd1 ~]$ start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/hbase-2.3.5/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /opt/apps/hbase/bin/../logs/hbase-hadoop-master-hd1.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/hbase-2.3.5/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hd5: running regionserver, logging to /opt/apps/hbase/bin/../logs/hbase-hadoop-regionserver-hd5.out
hd3: running regionserver, logging to /opt/apps/hbase/bin/../logs/hbase-hadoop-regionserver-hd3.out
hd4: running regionserver, logging to /opt/apps/hbase/bin/../logs/hbase-hadoop-regionserver-hd4.out
hd2: running regionserver, logging to /opt/apps/hbase/bin/../logs/hbase-hadoop-regionserver-hd2.out

查看各服务器上的进程

[hadoop@hd1 apps]$ ~/op/ssh_all.sh jps
====================hadoop@hd1=================
ssh hadoop@hd1 "source /etc/profile;source ~/.bash_profile;jps"
1793 JournalNode
5267 HMaster
6131 Jps
1556 NameNode
2373 ResourceManager
1336 QuorumPeerMain
1979 DFSZKFailoverController
2589 WebAppProxyServer
OK
====================hadoop@hd2=================
ssh hadoop@hd2 "source /etc/profile;source ~/.bash_profile;jps"
3344 Jps
1457 JournalNode
2929 HRegionServer
1763 ResourceManager
1652 DFSZKFailoverController
1271 QuorumPeerMain
1372 NameNode
OK
====================hadoop@hd3=================
ssh hadoop@hd3 "source /etc/profile;source ~/.bash_profile;jps"
1362 DataNode
1267 QuorumPeerMain
2890 Jps
1484 NodeManager
2524 HRegionServer
OK
====================hadoop@hd4=================
ssh hadoop@hd4 "source /etc/profile;source ~/.bash_profile;jps"
1393 NodeManager
2355 HRegionServer
1274 DataNode
2717 Jps
OK
====================hadoop@hd5=================
ssh hadoop@hd5 "source /etc/profile;source ~/.bash_profile;jps"
2613 HRegionServer
2950 Jps
1239 DataNode
1358 NodeManager
OK

访问WEB UI

http://hd1:60010/
从零开始搭建个人大数据集群(5)——HBASE安装

上一篇:hbase配置信息详解


下一篇:Hbase角色和架构