【回顾】Spark项目工程搭建

文章目录


资源下载:

IntelliJ IDEA Community Edition 2021.2.3:https://www.jetbrains.com/idea/download/other.html
Maven 3.5.4:https://archive.apache.org/dist/maven/maven-3/
Spark 2.4.2:http://archive.apache.org/dist/spark/
Scala 2.12.2:https://www.scala-lang.org/download/all.html
Hadoop 2.6.0:https://archive.apache.org/dist/hadoop/common/


一、新建项目

  • New Project 新建项目【回顾】Spark项目工程搭建
  • 选择Maven工程,Next【回顾】Spark项目工程搭建
  • 自定义项目名称,及项目存储的位置,Finish即可完成新项目的创建
    【回顾】Spark项目工程搭建

返回顶部


二、配置Maven环境

  • File下找到setting,在搜索栏中查找maven
  • 修改Maven home pathUser settings fileLocal repository三个文件路径(要勾中Override),修改完后Applay即可
    【回顾】Spark项目工程搭建
  • 配置完Maven后,打开右侧的Maven工具栏,如果此处有报错(红色波浪线),点击循环按钮刷新一下,等待几分钟即可;如果还是有报错,可能Mavend的配置存在问题,进行复查
    【回顾】Spark项目工程搭建
  • 如下是pom.xml依赖配置文件,可以自行复制
	<?xml version="1.0" encoding="UTF-8"?>
	<project xmlns="http://maven.apache.org/POM/4.0.0"
	         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	    <modelVersion>4.0.0</modelVersion>
	
	    <groupId>org.example</groupId>
	    <artifactId>Spark</artifactId>
	    <version>1.0-SNAPSHOT</version>
	
	    <properties>
	        <scala.version>2.12.2</scala.version>
	        <spark.version>2.4.2</spark.version>
	        <hadoop.version>2.6.0</hadoop.version>
	        <maven.compiler.source>8</maven.compiler.source>
	        <maven.compiler.target>8</maven.compiler.target>
	    </properties>
	
	    <!--阿里云镜像-->
	    <repositories>
	        <repository>
	            <id>nexus-aliyun</id>
	            <name>Nexus aliyun</name>
	            <url>http://maven.aliyun.com/nexus/content/groups/public</url>
	        </repository>
	    </repositories>
	
	    <dependencies>
	        <!-- 导入scala的依赖 -->
	        <dependency>
	            <groupId>org.scala-lang</groupId>
	            <artifactId>scala-library</artifactId>
	            <version>${scala.version}</version>
	        </dependency>
	        <!-- 导入spark core的依赖 -->
	        <dependency>
	            <groupId>org.apache.spark</groupId>
	            <artifactId>spark-core_2.12</artifactId>
	            <version>${spark.version}</version>
	        </dependency>
	        <!-- 导入spark sql的依赖 -->
	        <dependency>
	            <groupId>org.apache.spark</groupId>
	            <artifactId>spark-sql_2.12</artifactId>
	            <version>${spark.version}</version>
	        </dependency>
	        <!-- hadoop相关依赖 -->
	        <dependency>
	            <groupId>org.apache.hadoop</groupId>
	            <artifactId>hadoop-client</artifactId>
	            <version>${hadoop.version}</version>
	        </dependency>
	        <!-- 导入hadoop-hdfs的依赖 -->
	        <dependency>
	            <groupId>org.apache.hadoop</groupId>
	            <artifactId>hadoop-hdfs</artifactId>
	            <version>${hadoop.version}</version>
	        </dependency>
	        <!-- 导入日志的依赖 -->
	        <dependency>
	            <groupId>log4j</groupId>
	            <artifactId>log4j</artifactId>
	            <version>1.2.12</version>
	        </dependency>
	    </dependencies>
	
	    <build>
	        <plugins>
	
	            <plugin>
	                <groupId>org.apache.maven.plugins</groupId>
	                <artifactId>maven-compiler-plugin</artifactId>
	                <version>3.0</version>
	                <configuration>
	                    <source>1.8</source>
	                    <target>1.8</target>
	                    <encoding>UTF-8</encoding>
	                </configuration>
	            </plugin>
	
	            <plugin>
	                <groupId>net.alchim31.maven</groupId>
	                <artifactId>scala-maven-plugin</artifactId>
	                <version>3.2.0</version>
	                <executions>
	                    <execution>
	                        <goals>
	                            <goal>compile</goal>
	                            <goal>testCompile</goal>
	                        </goals>
	                        <configuration>
	                            <args>
	                                <arg>-dependencyfile</arg>
	                                <arg>${project.build.directory}/.scala_dependencies</arg>
	                            </args>
	                        </configuration>
	                    </execution>
	                </executions>
	            </plugin>
	            <plugin>
	                <groupId>org.apache.maven.plugins</groupId>
	                <artifactId>maven-shade-plugin</artifactId>
	                <version>3.1.1</version>
	                <executions>
	                    <execution>
	                        <phase>package</phase>
	                        <goals>
	                            <goal>shade</goal>
	                        </goals>
	                        <configuration>
	                            <filters>
	                                <filter>
	                                    <artifact>*:*</artifact>
	                                    <excludes>
	                                        <exclude>META-INF/*.SF</exclude>
	                                        <exclude>META-INF/*.DSA</exclude>
	                                        <exclude>META-INF/*.RSA</exclude>
	                                    </excludes>
	                                </filter>
	                            </filters>
	                            <transformers>
	                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
	                                    <mainClass></mainClass>
	                                </transformer>
	                            </transformers>
	                        </configuration>
	                    </execution>
	                </executions>
	            </plugin>
	        </plugins>
	    </build>
	</project>
  • 导入依赖配置文件后,需要点击如图所示的图标进行依赖的导入下载(所有的依赖资源将会下载到Maven配置中的Local repository目录中),下载完成后可以再次刷新,不再报错即可
    【回顾】Spark项目工程搭建

返回顶部


三、配置Scala环境

  • 对于scala语言的支持,ideal中需要借助插件 Scala,在setting中打开plugins,搜索scala下载即可,下载完成后需要重启ideal
    【回顾】Spark项目工程搭建
  • 重启完成后,右击项目选择Add Framework Support,添加项目支持框架
    【回顾】Spark项目工程搭建
  • 选择scala,在右侧点击Create,弹框中可以选择Download在线下载,也可以选择Browse…,浏览找到自己下载好的scala
    【回顾】Spark项目工程搭建
  • 选择好之后,ok即可完成Scala的环境配置
    【回顾】Spark项目工程搭建
    返回顶部

四、测试准备

  • 在项目中如图所示位置创建新的scala文件夹,test中同样创建scala文件夹,右击Mark Directory as,将其分别注为Sources RootTest Sources Root
    【回顾】Spark项目工程搭建
  • 标注完成后,右击scala文件夹创建Scala Class文件进行编程测试
  • 上半部分图中没有出现Scala Class的原因有:没有配置scala环境没有对scala文件进行标注

【回顾】Spark项目工程搭建

返回顶部


五、词频统计测试

  • 右击创建WordCount.scala文件
package wordCount

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object WordCount {

  def main(args: Array[String]): Unit = {
    // 创建sparkContext  --- 本地模式运行
    val conf = new SparkConf().setAppName("wordCount").setMaster("local[6]")
    val sc = new SparkContext(conf)

    // 加载文件
    val context: RDD[String] = sc.textFile("G:\\Projects\\IdealProject-C21\\projects\\src\\main\\scala\\wordCount\\test.txt")

    // 数据处理
    val split: RDD[String] = context.flatMap(item => item.split(" "))
    val count: RDD[(String, Int)] = split.map(item => (item, 1))
    val reduce = count.reduceByKey((curr, agg) => curr + agg)
    val result = reduce.collect()
    result.foreach(println(_))
  }

}

【回顾】Spark项目工程搭建

返回顶部


上一篇:CCF认证201903-1小中大


下一篇:CSP认证 201903-1 小中大