基于Apache版本的大数据组件的Atlas安装

Atlas2.1.0基于Apache开源版本的大数据组件的安装详细记录(测试环境)

说明:Atlas安装参考了大量的网上资料,在此记录仅用作日后方便查看,如若本文章有侵权行为,请立即联系。

组件版本

组件名称 组件版本
Hadoop 3.2.1
Hive 3.1.2
Hbase 2.3.4
Zookeeper 3.5.9
Kafka 2.6.2
Solr 7.4.0
Atlas 2.1.0
jdk 1.8
Maven 3.6.3

一、Atlas2.1.0编译

前提:编译我是通过虚拟机进行的,虚拟机安装的是centos7.6操作系统

1.虚拟机的搭建

2.安装jdk

1)卸载centos7.6自带的openjdk(这个一定要卸载,不然编译会有问题)
rpm -qa | grep openjdk
rpm -e --nodeps + 上面查询的结果

2)安装自己的jdk1.8
mkdir /app
tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app
mv jdk1.8 jdk

配置环境变量
vim /etc/profile
在最后添加如下内容:
export JAVA_HOME=/app/jdk
export PATH=$PATH:$JAVA_HOME/bin:
保存并退出

使环境变量生效
source /etc/profile

验证
java -version

3.安装Maven

安装的maven版本为maven3.6.3

tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /app
mv apache-maven-3.6.3-bin maven

配置环境变量
vim /etc/profile
在最后添加如下内容:
export MVN_HOME=/app/maven
export PATH=$PATH:$JAVA_HOME/bin:$MAV_HOME/bin:
保存并退出

使环境变量生效
source /etc/profile

验证
mvn -version
配置maven仓库地址

vim /app/maven/cong/settings.xml

添加:
	<mirror>
    	<id>alimaven</id>
    	<name>aliyun maven</name>
    	<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    	<mirrorOf>central</mirrorOf>
	</mirror>
<!-- *仓库1 -->
    <mirror>
        <id>repo1</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo1.maven.org/maven2/</url>
    </mirror>
<!-- *仓库2 -->
    <mirror>
        <id>repo2</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo2.maven.org/maven2/</url>
    </mirror>

4.编译Atlas

tar -zxvf apache-atlas-2.1.0-sources.tar.gz -C /app
cd /app/apache-atlas-sources-2.1.0

编辑项目的*pom.xml文件

vim pom.xml

修改各个组件的版本,主要修改如下:
<hadoop.version>3.2.1</hadoop.version>
<hbase.version>2.3.4</hbase.version>
<solr.version>7.5.0</solr.version>
<hive.version>3.1.2</hive.version>
<kafka.version>2.2.1</kafka.version>
<kafka.scala.binary.version>2.11</kafka.scala.binary.version>
<calcite.version>1.16.0</calcite.version>
<zookeeper.version>3.5.9</zookeeper.version>
<falcon.version>0.8</falcon.version>
<sqoop.version>1.4.6.2.3.99.0-195</sqoop.version>
<storm.version>1.2.0</storm.version>
<curator.version>4.0.1</curator.version>
<elasticsearch.version>5.6.4</elasticsearch.version>

需要修改的代码部分(网上资料说需要修改该部分代码,我已修改并成功运行,目前只测试了hive的hook,没有遇到任何问题,不知道不修改会怎样)

vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java

577行
将:
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
改为:
String catalogName = null;
vim /app/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java

81行
将:
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
改为:
this.metastoreHandler = null;

进行编译

cd /app/apache-atlas-sources-2.1.0

打包:(使用外部hbase和solr的打包方式,这里不考虑使用atlas自带的)
mvn clean -DskipTests package -Pdist -X

注:编译过程中可能会遇到报错,基本都是因为网络的问题,重试即可解决,如若重试也没有解决jar包的下载问题,可手动下载缺失的jar,放到本地maven仓库后重新打包。

编译完成后的atlas存放位置

cd /app/apache-atlas-sources-2.1.0/distro/target

apache-atlas-2.1.0-bin.tar.gz 就是我们所需要的包

二、组件安装

说明:本次atlas安装使用外部独立的hbase以及solr,因此需要单独部署Hadoop、Hive、Zoopkeeper、Kafka、Solr、Hbase,并使用3台虚拟机进行测试,如下:

虚拟机名称 操作系统 IP
hadoop01 Centos7.6 192.168.190.15
hadoop02 Centos7.6 192.168.190.16
hadoop03 Centos7.6 192.168.190.17

3台机器配置的环境变量如下:(这里先给出)

vim /etc/profile

export JAVA_HOME=/app/jdk
export ZK_HOME=/app/zookeeper
export HIVE_HOME=/app/hive
export HADOOP_HOME=/app/hadoop
export HBASE_HOME=/app/hbase
export KAFKA_HOME=/app/kafka
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$KAFKA_HOME/bin:

1.jdk安装

1)卸载centos7.6自带的openjdk
rpm -qa | grep openjdk
rpm -e --nodeps + 上面查询的结果

2)安装自己的jdk1.8
mkdir /app
tar -zxvf jdk-8u151-linux-x64.tar.gz -C /app
mv jdk1.8 jdk

配置环境变量
vim /etc/profile
在最后添加如下内容:
export JAVA_HOME=/app/jdk
export PATH=$PATH:$JAVA_HOME/bin:
保存并退出

使环境变量生效
source /etc/profile

验证
java -version

然后将/app/jdk整个文件夹拷贝到hadoop02、hadoop03并配置环境变量
scp -r /app/jdk hadoop02:/app/
scp -r /app/jdk hadoop03:/app/

2.Zookeeper安装

mkdir /app
tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /app
mv apache-zookeeper-3.5.9-bin zookeeper
cd /app/zookeeper/conf
将zoo_sample.cfg拷贝一份
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/app/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
创建data文件
mkdir /app/zookeeper/data
cd /app/zookeeper/data
touch myid && echo "1" > myid
然后将/app/zookeeper整个文件夹拷贝到hadoop02、hadoop03并配置环境变量
scp -r /app/zookeeper hadoop02:/app/
scp -r /app/zookeeper hadoop03:/app/

并修改hadoop02、hadoop03机器上的/app/zookeeper/data/myid文件
hadoop02   2
hadoop03   3
3台机器上分别启动zk
zkServer.sh start

3.安装Hadoop

tar -zxvf hadoop-3.2.1.tar.gz -C /app
mv hadoop-3.2.1 hadoop

需要编辑的文件都在/app/hadoop/etc/hadoop目录下

core-site.xml

vim core-site.xml

<configuration>
    # HDFS主入口,mycluster仅是作为集群的逻辑名称,可随意更改但务必与hdfs-site.xml中dfs.nameservices值保持一致
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>

    # 默认的hadoop.tmp.dir指向的是/tmp目录,将导致namenode与datanode数据全都保存在易失目录中,此处进行修改
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop</value>
    </property>

    # 用户角色配置,不配置此项会导致web页面报错
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>

    # zookeeper集群地址,这里只配置了单台,如是集群以逗号进行分隔
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
	</property>

	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
	</property>

	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
	</property>
</configuration>

hadoop-env.sh

export JAVA_HOME=/app/jdk
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_ZKFC_USER="root"
export HDFS_JOURNALNODE_USER="root"

hdfs-site.xml

<configuration>
	<property>
       <name>dfs.replication</name>
       <value>2</value>
   </property>
   <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
   </property>
   <!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致 -->
   <property>
       <name>dfs.nameservices</name>
       <value>mycluster</value>
   </property>
   <!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->
   <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
   </property>
   <!-- RPC通信地址 -->
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop01:8020</value>
   </property>
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop02:8020</value>
   </property>
 <!-- http通信地址 -->
   <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop01:9870</value>
   </property>
   <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop02:9870</value>
   </property>
   <!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
   <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value>
   </property>
   <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
   <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/ha-hadoop/journaldata</value>
   </property>
	<!-- 开启NameNode失败自动切换 -->
   <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
   </property>
   <!-- 配置失败自动切换实现方式 -->
   <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>
   <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
   <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
                sshfence
                shell(/bin/true)
        </value>
   </property>
   <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
   <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
   </property>
   <!-- 配置sshfence隔离机制超时时间 -->
   <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
   </property>
</configuration>

mapred-env.sh

export JAVA_HOME=/app/jdk

mapred-site.xml

<configuration>
     <!-- 指定mr框架为yarn方式 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <!-- 指定mapreduce jobhistory地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>

    <!-- 任务历史服务器的web地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>

    <property>
      <name>mapreduce.application.classpath</name>
      <value>
                /app/hadoop/etc/hadoop,
                /app/hadoop/share/hadoop/common/*,
                /app/hadoop/share/hadoop/common/lib/*,
                /app/hadoop/share/hadoop/hdfs/*,
                /app/hadoop/share/hadoop/hdfs/lib/*,
                /app/hadoop/share/hadoop/mapreduce/*,
                /app/hadoop/share/hadoop/mapreduce/lib/*,
                /app/hadoop/share/hadoop/yarn/*,
                /app/hadoop/share/hadoop/yarn/lib/*
      </value>
    </property>
</configuration>

yarn-env.sh

export JAVA_HOME=/app/jdk

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
    <!-- 开启RM高可用 -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!-- 指定RM的cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster1</value>
    </property>

    <!-- 指定RM的名字 -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- 分别指定RM的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop02</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop01:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop02:8088</value>
    </property>

    <!-- 指定zk集群地址 -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>

    <!-- 启用自动恢复 -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!-- 制定resourcemanager的状态信息存储在zookeeper集群上 -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>

    <!-- Whether virtual memory limits will be enforced for containers.  -->
    <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
    </property>

    <property>
                <name>yarn.nodemanager.vmem-pmem-ratio</name>
                <value>5</value>
    </property>

</configuration>

workers

hadoop01
hadoop02
hadoop03

hadoop3有权限问题,为避免因权限问题造成的启动失败,在如下文件添加指定用户

vim /app/hadoop/sbin/start-dfs.sh
vim /app/hadoop/sbin/stop-dfs.sh

添加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
vim /app/hadoop/sbin/start-yarn.sh
vim /app/hadoop/sbin/stop-yarn.sh

添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

启动

Zookeeper->JournalNode->格式化NameNode->创建命名空间zkfs->NameNode->Datanode->ResourceManager->NodeManager

启动JournalNode

3台机器上启动JournalNode
cd /app/hadoop/sbin/
./hadoop-daemon.sh start journalnode  启动journalnode

格式化namenode

在hadoop01上执行
hadoop namenode -format

将/data/hadoop/dfs/name目录下的内容拷贝到备用namenode主机

如果备用namenode主机没有该目录就创建一个
scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/

格式化zkfc

在两个namenode主机上进行zkfc的格式化
./hdfs zkfc -formatZK

关闭JournalNode

3台机器上关闭JournalNode
cd /app/hadoop/sbin/
./hadoop-daemon.sh stop journalnode

启动hadoop

在hadoop01机器上执行:
start-all.sh

4.安装Hbase

tar -zxvf hbase-2.3.4-bin.tar.gz -C /app
mv hbase-2.3.4-bin hbase

需要编辑的文件都在/app/hbase/conf目录下

hbase-env.sh

export JAVA_HOME=/app/jdk
export HBASE_CLASSPATH=/app/hadoop/etc/hadoop

hbase-site.xml

<configuration>
<!-- mycluster是根据hdfs-site.xml的dfs.nameservices的value进行配置 -->
<property>
        <name>hbase.rootdir</name>
        <value>hdfs://mycluster/hbase</value>
</property>
<property>
        <name>hbase.master</name>
        <value>8020</value>
</property>
<!-- zookeeper集群 -->
<property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
        <name>hbase.zookeeper.property.clientProt</name>
        <value>2181</value>
</property>
<property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/app/zookeeper/conf</value>
</property>
<property>
        <name>hbase.tmp.dir</name>
        <value>/var/hbase/tmp</value>
</property>
<property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>

<!-- 如果启动不了Hmaster,查看日志报了下面错误:  The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
     则启用该配置
<property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
</property>
-->
</configuration>

regionservers

hadoop01
hadoop02
hadoop03

Hbase启动高可用需要编辑文件backup-masters(里面添加备用的HMaster的主机)

vim backup-masters

hadoop03

启动Hbase

start-hbase.sh

5.安装hive

mysql安装
略

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app
mv apache-hive-3.1.2-bin hive

需要编辑的文件都在/app/hive/conf目录下

hive-env.sh

export HADOOP_HOME=/app/hadoop/
export HIVE_CONF_DIR=/app/hive/conf/

hive-site.xml

<configuration>
<!-- 记录HIve中的元数据信息  记录在mysql中 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>

<!-- jdbc mysql驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<!-- mysql的用户名和密码 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>mysql中hive的用户名</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql中hive的密码</value>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
</property>

<!-- 日志目录 -->
<property>
<name>hive.querylog.location</name>
<value>/user/hive/log</value>
</property>

<!-- 设置metastore的节点信息 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop01:9083</value>
</property>
<!-- 客户端远程连接的端口 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>0.0.0.0</value>
</property>

<!-- hive服务的页面的端口 -->
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>

<property>
<name>hive.server2.long.polling.timeout</name>
<value>5000</value>
</property>

<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
</configuration>

将mysql的驱动jar包上传到hive的lib目录下

初始化hive的元数据库

schematool -dbType mysql -initSchema

启动hive的matestore

hive --service matestore &

进入hive进行验证

hive

将/app/hive目录进行分发(目的是所有机器都可以使用hive,不需要修改任何配置)

scp -r /app/hive hadoop02:/app/
scp -r /app/hive hadoop02:/app/

6.安装Kafka

tar -zxvf kafka_2.12-2.6.2.tgz -C /app
mv kafka_2.12-2.6.2 kafka

需要编辑的文件都在/app/kafka/config目录下

server.properties

broker.id=0
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
将/app/kafka文件分发到其余的机器并修改/app/kafka/config/server.properties文件中的broker.id的值
scp -r /app/kafka hadoop02:/app/
scp -r /app/kafka hadoop02:/app/
vim /app/kafka/config/server.properties
hadoop02    1
hadoop03    2

启动kafka

3台机器分别启动kafka
cd /app/kafka/bin

后台启动:
./kafka-server-start.sh -daemon ../config/server.properties

7.安装solr

tar -zxvf solr-7.4.0.tgz -C /app
mv solr-7.4.0 solr

需要编辑的文件都在/app/solr/bin目录下

solr.in.sh

ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181"
SOLR_HOST="hadoop01"
将/app/solr文件分发到其余的机器并修改/app/solr/bin/solr.in.sh文件中的SOLR_HOST的值
scp -r /app/solr hadoop02:/app/
scp -r /app/solr hadoop02:/app/
vim /app/solr/bin/solr.in.sh
hadoop02    hadoop02
hadoop03    hadoop03

启动solr

3台机器分别启动solr
cd /app/solr/bin
./solr start -force

8.安装Atlas

将‘一’中编译好的apache-atlas-2.1.0-bin.tar.gz包进行上传(这里是上传到了hadoop03机器)
tar -zxvf apache-atlas-2.1.0-bin.tar.gz -C /app
mv apache-atlas-2.1.0-bin atlas

需要编辑的文件都在/app/atlas/conf目录下

atlas-env.sh

export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export JAVA_HOME=/app/jdk
export HBASE_CONF_DIR=/app/hbase/conf

atlas-application.properties (这里给出全部内容,只集成了hive作为测试,如若有其他组件的需要,进行组件的安装与atlas hook的配置即可)

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181
atlas.kafka.bootstrap.servers=192.168.190.15:9092,192.168.190.16:9092,192.168.190.17:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########
atlas.rest.address=http://192.168.190.17:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.190.15:2181,192.168.190.16:2181,192.168.190.17:2181

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>


#########  UI Configuration ########

atlas.ui.default.version=v1


######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

集成hbase

ln -s /app/hbase/conf/ /app/atlas/conf/hbase/
cp /app/hbase/conf/* /app/atlas/conf/hbase/

集成solr

cp -r /app/atlas/conf/solr /app/solr/
cd /app/solr/
mv solr/ atlas-solr
scp -r ./atlas-solr/ hadoop01:/app/solr/
scp -r ./atlas-solr/ hadoop02:/app/solr/


重启solr
cd /app/solr/bin/
./solr stop -force
./solr start -force

在solr中创建索引
solr create -c vertex_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force
solr create -c edge_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force
solr create -c fulltext_index -d /app/solr/atlas-solr/ -shards 3 -replicationFactor 2 -force

kafka相关操作

在kafka中创建相关topic
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

集成hive

cd /app/atlas/conf      (这个很关键,一定要在这个目录下)
zip -u /app/atlas/hook/hive/hive-bridge-shim-2.1.0.jar atlas-application.properties

cp -r /app/atlas/hook/hive/* /app/hive/lib/
scp -r /app/atlas/hook/hive/* hadoop01:/app/hive/lib/
scp -r /app/atlas/hook/hive/* hadoop02:/app/hive/lib/
cp ./atlas-application.properties /app/hive/conf/
scp ./atlas-application.properties hadoop01:/app/hive/conf/
scp ./atlas-application.properties hadoop02:/app/hive/conf/

hive相关配置

3台机器均需要配置
cd /app/hive/conf

hive-env.sh中添加
export JAVA_HOME=/app/jdk
export HIVE_AUX_JARS_PATH=/app/hive/lib/


hive-site.xml中添加:
<property>
      <name>hive.exec.post.hooks</name>
      <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

启动atlas

cd /app/atlas/bin

./atlas_start.py

说明:第一次启动atlas需要经过漫长的等待,即使显示启动完成了也需要等待一段时间才能访问atlas web ui
可以在/app/atlas/logs目录下进行日志的查看以及报错情况

启动完成后导入hive元数据

cd /app/atlas/bin
./import-hive.sh

完成后就可查看正常的血缘关系了
上一篇:Elasticserach快速开始


下一篇:第128天学习打卡(ElasticSearch 初识elasticreach 及安装)