2017年3月1日, 星期三
Spark集群搭建_Standalone
Driver: node1
Worker: node2
Worker: node3
1.下载安装
下载地址:http://spark.apache.org/downloads.html
Standalone模式的spark集群虽然不依赖于yarn,但是数据文件存在hdfs,所以需要hdfs集群启动成功
这里下载包也要根据hadoop集群版本启动
比如hadoop2.5.2需要下载spark-1.4.0-bin-hadoop2.4.tgz
下载解压进入解压缩目录
2.配置启动
2.1.上传Spark.jar,解压,修改配置文件(改名,配置)
mv slaves.template slaves
vi slaves(里面配置从节点的主机名或者是IP)
mv spark.env.sh.template spark-env.sh
vi spark-env.sh
配置spark-env.sh
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1G
2.2配置环境变量
Spark启动的start-all.sh 和 hadoop启动的start-all.sh的冲突,所以需要修改名字,然后配置环境变量
mv start-all.sh spark-start-all.sh
mv stop-all.sh spark-stop-all.sh
vi /etc/profile node2,node3两个从节点也要配置Spark的环境变量
source /etc/profile
2.3启动Spark集群
spark-start-all.sh node1
jps 查看启动状态
Spark集群测试命令
standalone client模式
./spark-submit --master spark://node1:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000
standalone cluster模式
./spark-submit --master spark://node1:7077 --deploy-mode cluster --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000
访问node1:8080能看到Spark web界面
附录
不同运行模式的命令不同
1.standalone client模式
./bin/spark-submit --class org.apache.spark.examples.SparkPi--master spark://master:7077 --executor-memory 512m --total-executor-cores 1 ./lib/spark-examples-1.5.2-hadoop2.4.0.jar 100
2.standalone cluster模式
./bin/spark-submit --class org.apache.spark.examples.SparkPi--master spark://spark001:7077 --driver-memory 512m --deploy-mode cluster --supervise --executor-memory 512M --total-executor-cores 1 ./lib/spark-examples-1.5.2-hadoop2.4.0.jar 100
3.on yarn client模式
./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-client --executor-memory 512M--num-executors 1./lib/spark-examples-1.5.2-hadoop2.4.0.jar 100
4.on yarn cluster模式
./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-cluster --executor-memory 512m--num-executors 1./lib/spark-examples-1.5.2-hadoop2.4.0.jar 100
SparkSQL与Hive整合
1、只需要在master节点的conf里面创建一个hive-site.xml 然后里面的配置是:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop1:9083</value>
<description>Thrift uri for the remote metastore.Used by metastore client to connect to remote metastore.</description>
</property>
</configuration>
2、启动hive的metastore服务