Spark相关

非常好的spark分析博客,我们team的,哈哈:http://jerryshao.me/

spark programming guide:

https://github.com/mesos/spark/wiki/Spark-Programming-Guide

-------------------------------------------------------------

scala安装:

$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz

~/.bashrc中添加:

export SCALA_HOME=/usr/scala/scala-2.9.3
export PATH=$PATH:$SCALA_HOME/bin

-------------------------------------------------

编译:
SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly 需要安装hadoop

Spark Standalone Mode安装

主机:
192.168.56.103
从机:
192.168.56.102
192.168.56.103 conf/spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export SCALA_HOME=/usr/local/src/scala-2.9.3/
export SPARK_MASTER_IP=192.168.56.103 export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_WORKER_WEBUI_PORT=8081 export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=512m
conf/slaves
# A Spark Worker will be started on each of the machines listed below.
192.168.56.102
192.168.56.103

  主机和从机的这两个文件是一样的,之后再主机上执行:

 bin/start-all.sh

然后测试是否开启成功:

主机jps:

8787 Worker
3017 NameNode
9366 Jps
3728 TaskTracker
8454 Master
2830 DataNode
2827 SecondaryNameNode
3484 JobTracker

从机jps:

6649 Worker
2592 DataNode
2997 TaskTracker
7105 Jps

webUI:

(主机master,可以查看各个worker的工作状态)    http://localhost:8080/

运行例子:

在主机上:

./run-example org.apache.spark.examples.SparkPi spark://192.168.56.103:7077

./run-example org.apache.spark.examples.SparkLR spark://192.168.56.103:7077



Mesos部署Spark
。。。 ----------------------------------------------
去中心化调度器(sparrow):

http://www.binospace.com/index.php/sparrow-sosp13-an-accelerated-short-job-scheduling-method/

上一篇:【Arxiv 2021】《 Putting Humans in the Natural Language Processing Loop: A Survey》阅读笔记


下一篇:python 面向对象学习