Linux下搭建PySpark环境

linux版scala:https://downloads.lightbend.com/scala/2.11.0/scala-2.11.0.tgz
linux/windows通用版spark:https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
linux/windows通用版hadoop:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

安装spark:
tar -zxvf ./spark-2.4.3-bin-hadoop2.7.tgz -C ./spark
export SPARK_HOME=/home/service/spark-2.4.5-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

安装hadoop:
tar -zxvf ./hadoop-2.6.0-cdh5.8.5.tar.gz -C ./hadoop
export HADOOP_HOME=/home/service/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$PATH

安装scala:
tar -zxvf ./scala-2.13.0.tgz -C ./scala
export SCALA_HOME=/home/service/scala-2.11.0
export PATH=$SCALA_HOME/bin:$PATH
source ~/.bashrc

安装pyspark:
pip install pyspark

 

参考:

https://www.cnblogs.com/traditional/p/11297049.html

https://juejin.im/post/5cd16c00e51d453a51433062

上一篇:LeetCode: Partition List 解题报告


下一篇:Anaconda 离线安装 python 包的操作方法