因项目需要,需要搭建一个cloudera集群,由于项目分工不同,之前一直有运维同学负责cloudera运维升级问题,本人平时只是跑跑作业,这是首次自己搭建cloudera集群。
由于运维同学提供的cloudera 安装文档版本为 5.9,所以就选择安装5.9版本,安装过程中主要参考了 “破击手”写的:https://www.cnblogs.com/pojishou/p/6267616.html,在此特此感谢!
集群资源配置和角色分配:
主机名 内存 磁盘 CPU 角色
zj-hadoop-01 8G 80G 1核 HDFS DataNode/Hive Gateway/Hue Server/Impala Catalog Server/Impala Daemon/Impala StateStore/Spark Gateway/Spark History Server/YARN (MR2 Included) NodeManager/ZooKeeper Server
zj-hadoop-02 8G 80G 1核 HDFS DataNode/Hive Gateway/Impala Daemon/Cloudera Management Service Alert Publisher/Cloudera Management Service Event Server/Cloudera Management Service Host Monitor/Cloudera Management Service Service Monitor/Spark Gateway/YARN (MR2 Included) NodeManager/ZooKeeper Server
zj-hadoop-03 8G 80G 1核 HDFS Balancer/HDFS DataNode/HDFS NameNode/HDFS SecondaryNameNode/Hive Gateway/Hive Metastore Server/HiveServer2/Impala Daemon/Oozie Server/Spark Gateway/YARN (MR2 Included) JobHistory Server/YARN (MR2 Included) ResourceManager/ZooKeeper Server
1、安装文件准备
manifest 文件:http://archive.cloudera.com/cdh5/parcels/5.9.0.23/manifest.json
CDH5.9 主文件:http://archive.cloudera.com/cdh5/parcels/5.9.0.23/CDH-5.9.0-1.cdh5.9.0.p0.23-el7.parcel
CDH5.9 sha文件:http://archive.cloudera.com/cdh5/parcels/5.9.0.23/CDH-5.9.0-1.cdh5.9.0.p0.23-el7.parcel.sha1
Cloudera Manager 5.9:http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.9.0_x86_64.tar.gz
2、虚拟机准备
名称 版本
centos 7.4
jdk 1.8
python 2.7
sshd openSSH_7.4p1
ntp 4.2.6p5
scala 2.11.4
以上安装流程不在复述,根据个人习惯安装
3、修改hostname
vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=zj-hadoop-01 #三台节点分别为:zj-hadoop-01、zj-hadoop-02、zj-hadoop-03
4、关闭SELinux
vi /etc/selinux/config SELINUX=disabled
5、解压安装包,创建指定目录
tar -zxvf cloudera-manager-el6-cm5.9.0_x86_64.tar.gz -C /opt/ mv /opt/cm-5.9.0/ /opt/program/ ln -s /opt/program/cm-5.9.0/ /opt/cm
mv CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1 /opt/cloudera/parcel-repo/
mv manifest.json /opt/cloudera/parcel-repo/
cd /opt/cloudera/parcel-repo/ mv CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1 CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha
6、修改配置文件,上传 jdbc jar 包
vi /opt/cm/etc/cloudera-scm-agent/config.ini server_host=master mv mysql-connector-java-5.1.40-bin.jar /opt/cm/share/cmf/lib/
7、为CM创建数据库
/opt/cm/share/cmf/schema/scm_prepare_database.sh mysql cm -hlocalhost -uroot -phdp --scm-host localhost scm scm scm
8、配置java环境变量
#查找脚本 deploy-cc.sh find / -type f -name "*.sh" | xargs grep "as ALT_NAME" vi /opt/cm/lib64/cmf/service/client/deploy-cc.sh #添加 JAVA_HOME=/usr/local/java export JAVA_HOME=/usr/local/java
9、为每个节点创建cloudera-scm用户
useradd --system --home=/opt/cm/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
10、拷贝CM到每个节点,包括软连接
11、在master上启动CM的service服务
/opt/cm/etc/init.d/cloudera-scm-server start
12、在所有你想作为worker的节点上启动CM的agent服务
/opt/cm/etc/init.d/cloudera-scm-agent start
13、创建Mysql数据库
--hive数据库 create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci; --oozie数据库 create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci; --hue数据库 create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
14、访问 cloudera manager 页面地址: http://master:7180/,默认用户名密码:admin
15、安装完成以后可以执行一个spark测试作业
su hdfs spark-submit \ --master yarn-client \ --class org.apache.spark.examples.SparkPi \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 2 \ /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar \ 10
16、如果安装过程中出现问题,需要重新安装,以下卸载操作提供参考
#关闭 CM 的 service 与agent服务
/opt/cm/etc/init.d/cloudera-scm-server stop
/opt/cm/etc/init.d/cloudera-scm-agent stop
#确认各个进程是否已经退出 ps -ef | grep supervisord ps -ef | grep cloudera
#删除 cloudera 安装包目录 rm -rf /opt/cloudera/csd/* rm -rf /opt/cloudera/parcels rm -rf /opt/cloudera/parcel-cache
umonut /opt/program/cm-5.9.0/run/cloudera-scm-agent/process
#删除各个组件目录 find / -name hue* | xargs rm -rf find / -name dfs* | xargs rm -rf find / -name hive* | xargs rm -rf find / -name hadoop* | xargs rm -rf find / -name *oozie* | xargs rm -rf find / -name *spark* | xargs rm -rf find / -name *impala* | xargs rm -rf find / -name *hdfs* | xargs rm -rf find / -name *hbase* | xargs rm -rf find / -name *yarn* | xargs rm -rf find / -name *zookeeper* | xargs rm -rf find / -name hsperfdata* | xargs rm -rf
#删除mysql库
drop database cm;
drop database hue;
drop database hive;
drop database oozie;