安装hadoop和mapreduce详解以及避坑指南
win10开发环境配置(括号中为我的安装路径,按需修改)
-
下载hadoop(D:\安装包\Download\hadoop)
https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/
Hadoop3.2.1有坑,不建议安装这个,坑直接翻到最后。
-
下载windows binaries and winutils for Hadoop 3.2.1(版本可以和上面不一致,D:\安装包\Download\hadoop)
https://github.com/selfgrowth/apache-hadoop-3.1.1-winutils,加压后覆盖hadoop中的bin目录。
-
拷贝bin下的hadoop.dll到C:\\Window\system32
-
添加环境变量
HADOOP_HOME=D:\hadoop\hadoop-3.2.1
添加path %HADOOP_HOME%\bin
-
报错:Hadoop Error: JAVA_HOME is incorrectly set.
JAVA_HOME的路径中是否含有空格,比如Program files这种的,如果是这种,请将空格部分加上英文的双引号。
配置maven中的pom.xml依赖项
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bjfu.jichuang</groupId>
<artifactId>my-wordcount</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<description></description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
<hadoop.version>3.2.1</hadoop.version>
<log4j.version>1.2.17</log4j.version>
<mockito.version>1.8.5</mockito.version>
<junit.version>4.10</junit.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>${mockito.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>
jar-with-dependencies
</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>
single
</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
启动hadoop
-
修改core-site.xml(D:\hadoop\hadoop-3.2.1\etc\hadoop)
新建tmp文件夹和name文件夹
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/D:/hadoop/hadoop-3.2.1/workplace/tmp</value> </property> <property> <name>dfs.name.dir</name> <value>/D:/hadoop/hadoop-3.2.1/workplace/name</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
修改hdfs-site.xml
新建datanode和namenode文件夹后修改对应内容
<configuration> <!-- 这个参数设置为1,因为是单机版hadoop --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.data.dir</name> <value>/D:/hadoop/hadoop-3.2.1/workplace/data</value> </property> </configuration>
-
修改mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:9001</value> </property> </configuration>
-
修改yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
-
编辑“hadoop”目录下的hadoop-env.cmd文件
@rem set JAVA_HOME=%JAVA_HOME% set JAVA_HOME=D:\java\jdk --jdk安装路径
-
格式化namenode
D:\安装包\Download\hadoop\hadoop-3.2.1\hadoop-3.2.1\bin>hadoop namenode -format
-
启动hadoop
D:\安装包\Download\hadoop\hadoop-3.2.1\hadoop-3.2.1\sbin>start-all.cmd
yarn运行成功,访问http://localhost:8088/cluster/apps
坑1:namenode格式化报错(3.2.1通病)
https://www.cnblogs.com/yifengjianbai/p/8258898.html
坑2:http://localhost:50070/无法访问
坑3:启动yarn的时候,无法启动nodemanager
Failed to setup local dir D:/hadoop/tmp/nm-local-dir, which was marked as good.
管理员权限问题,使用管理员权限运行start-yarn.cmd即可
坑4:8088端口UI界面不显示yarn执行的任务
在$HADOOP_HOME/conf/mapred-site.xml,添加如下代码:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
坑5:Hadoop项目出现No such file or directory错误
使用管理员身份运行ide即可。