Spark集群安装配置—Spar2.4.5-Centos7

一、实验环境

二、下载安装

三、核心文件配置

四、启动程序

----------------------------------------------------------

一、实验环境

可以先完成以下环境配置,也可直接安装:
1.1 Hadoop2.7集群安装配置
1.2 Anaconda3安装配置
1.3 系统:Centos7,hadoop用户(和Hadoop集群同个用户)

二、下载安装

2.1 下载地址:spark-2.4.5-bin-hadoop2.7.tgz
2.2 进入文件存放目录,解压缩:

$ sudo tar -zxvf ./spark-2.4.5-bin-hadoop2.7.tgz -C /usr/local/hdfs/
$ cd /usr/local/hdfs/
$ sudo mv ./spark-2.4.5-bin-hadoop2.7 ./spark2.4.5
$ sudo chown -R hadoop ./spark2.4.5
$ sudo ln -s /usr/local/hdfs/spark2.4.5 ~/hdfs/spark

2.3 配置环境变量

$ vi ~/.bash_profile

SPARK_HOME=/home/hadoop/hdfs/spark
export SPARK_HOME
PATH=$SPARK_HOME/bin:$PATH
export PATH

$ source ~/.bash_profile

在任何界面输入hive,然后连续按两下Tab键,显示下面内容则设置成功:

$ spark
spark  spark-class   sparkR   spark-shell   spark-sql   spark-submit

三、核心文件配置

$ cd ~/hdfs/spark/conf
$ sudo cp ./slaves.template  ./slaves
$ sudo cp ./spark-env.sh.template  ./spark-env.sh
$ sudo cp ./spark-defaults.conf.template ./spark-defaults.conf
$ sudo chown -R hadoop /usr/local/hdfs/spark2.4.5

3.1 slaves

$ vi ./slaves

增加所有的spark executor的机器

Master
Slave2
Slave3
....

3.2 spark-config.sh

& vi $SPARK_HOME/sbin/spark-config.sh

在空白处增加JAVA_HOME路径:

export JAVA_HOME=/usr/jvm/jdk1.8

3.3 spark-env.sh

$ vi ./spark-env.sh

在最后面加上如下一行:

export HADOOP_CONF_DIR=/usr/local/hdfs/hadoop/conf

3.4 spark-defaults.conf

$ start-all.sh
$ hdfs dfs -mkdir /spark_lib
hdfs dfs -mkdir /spark-logs
$ hdfs dfs -put ~/hdfs/spark/jars/* /spark_lib
$ #stop-all.sh
$ vi ./spark-defaults.conf

在后面空白增加:

spark.master    yarn                                # 告诉spark现在使用的是yarn模式
#spark.yarn.jars hdfs://Master:9000/spark_lib/*.jar  # spark jar包所在的目录  
#spark.yarn.stagingDir   hdfs://Master:9000/tmp      # spark运行的时候临时目录存放的文件

spark.history.provider            org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.fs.logDirectory     hdfs://Master:9000/spark-logs
spark.history.fs.update.interval  10s
spark.history.ui.port             18080
spark.eventLog.enabled true
#spark.eventLog.dir hdfs://Master:9000/spark-logs

“#”标记需要修改的地方,“Master”为NameNode主机名

3.5 yarn-site.xml

关闭检查真实的内存

sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

在原有Hadoop配置上,增加以下:

<property>
    <name>yarn.resourcemanager.address</name>
    <value>Master:8032</value>
</property>

<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>Master:8031</value>
</property>

<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>Master:8030</value>
</property>

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>                       
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property> 

<property>
    <name>yarn.acl.enable</name>
    <value>0</value>
</property>

3.6 mapred-site.xml

$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

在原有Hadoop配置上,增加以下:

<property>
	<name>mapreduce.jobtracker.address</name>
	<value>Master:54311</value>
	<description>MapReduce job tracker runs at this host and port.
	</description>
</property>

各Master为自己NameNode地址
把配置文件复制到各节点上
for i in {slave01,slave02}; do scp /usr/local/hdfs/spark2.4.5 $i:/usr/local/hdfs/; done

四、启动程序

$ #start-all.sh
$ $SPARK_HOME/sbin/start-all.sh

使用jps查看,有Master和Workers则启动成功:

$ jps
71601 SecondaryNameNode
71347 DataNode
71827 ResourceManager
72405 Master      
71212 NameNode
71964 NodeManager
72508 Worker
72734 Jps
$ spark-shell

启动成功后如图所示,会有 “scala >” 的命令提示符;并且 “master = yarn” 表示运行在yarn上

Spark context available as ‘sc‘ (master = yarn, app id = application_1628143668230_0003).
Spark session available as ‘spark‘.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  ‘_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.5
      /_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_301)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Spark集群安装配置—Spar2.4.5-Centos7

上一篇:STF日志提示 Not found ; no service started


下一篇:指针型函数与函数型指针 -2021.08.04