hadoop3.1.0 window win7 基础环境搭建

2023-07-21 17:56:04

https://blog.csdn.net/wsh596823919/article/details/80774805

hadoop3.1.0 window win7 基础环境搭建

前言：在windows上部署hadoop默认都是安装了java环境的哈。

1、下载hadoop3.1.0

https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common

2、下载之后解压到某个目录

3、配置hadoop_home

新建HADOOP_HOME，指向hadoop解压目录，如：D:/hadoop。path环境变量中增加：%HADOOP_HOME%\bin;。

4、配置hadoop相关文件

hadoop基本文件配置：hadoop配置文件位于：hadoop/etc/hadoop下

hadoop-env.cmd / core-site.xml / hdfs-site.xml / mapred-site.xml

hadoop-env.cmd,主要是在文件末尾添加了红色的字

@echo off

@rem Licensed to the Apache Software Foundation (ASF) under one or more

@rem contributor license agreements. See the NOTICE file distributed with

@rem this work for additional information regarding copyright ownership.

@rem The ASF licenses this file to You under the Apache License, Version 2.0

@rem (the "License"); you may not use this file except in compliance with

@rem the License. You may obtain a copy of the License at

@rem

@rem http://www.apache.org/licenses/LICENSE-2.0

@rem

@rem Unless required by applicable law or agreed to in writing, software

@rem distributed under the License is distributed on an "AS IS" BASIS,

@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

@rem See the License for the specific language governing permissions and

@rem limitations under the License.

@rem Set Hadoop-specific environment variables here.

@rem The only required environment variable is JAVA_HOME. All others are

@rem optional. When running a distributed configuration it is best to

@rem set JAVA_HOME in this file, so that it is correctly defined on

@rem remote nodes.

@rem The java implementation to use. Required.

set JAVA_HOME=%JAVA_HOME%

@rem The jsvc implementation to use. Jsvc is required to run secure datanodes.

@rem set JSVC_HOME=%JSVC_HOME%

@rem set HADOOP_CONF_DIR=

@rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.

if exist %HADOOP_HOME%\contrib\capacity-scheduler (

if not defined HADOOP_CLASSPATH (

set HADOOP_CLASSPATH=%HADOOP_HOME%\contrib\capacity-scheduler\*.jar

) else (

set HADOOP_CLASSPATH=%HADOOP_CLASSPATH%;%HADOOP_HOME%\contrib\capacity-scheduler\*.jar

)

@rem The maximum amount of heap to use, in MB. Default is 1000.

@rem set HADOOP_HEAPSIZE=

@rem set HADOOP_NAMENODE_INIT_HEAPSIZE=""

@rem Extra Java runtime options. Empty by default.

@rem set HADOOP_OPTS=%HADOOP_OPTS% -Djava.net.preferIPv4Stack=true

@rem Command specific options appended to HADOOP_OPTS when specified

if not defined HADOOP_SECURITY_LOGGER (

set HADOOP_SECURITY_LOGGER=INFO,RFAS

)

if not defined HDFS_AUDIT_LOGGER (

set HDFS_AUDIT_LOGGER=INFO,NullAppender

)

set HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_NAMENODE_OPTS%

set HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS %HADOOP_DATANODE_OPTS%

set HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_SECONDARYNAMENODE_OPTS%

@rem The following applies to multiple commands (fs, dfs, fsck, distcp etc)

set HADOOP_CLIENT_OPTS=-Xmx512m %HADOOP_CLIENT_OPTS%

@rem set HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData %HADOOP_JAVA_PLATFORM_OPTS%"

@rem On secure datanodes, user to run the datanode as after dropping privileges

set HADOOP_SECURE_DN_USER=%HADOOP_SECURE_DN_USER%

@rem Where log files are stored. %HADOOP_HOME%/logs by default.

@rem set HADOOP_LOG_DIR=%HADOOP_LOG_DIR%\%USERNAME%

@rem Where log files are stored in the secure data environment.

set HADOOP_SECURE_DN_LOG_DIR=%HADOOP_LOG_DIR%\%HADOOP_HDFS_USER%

@rem

@rem Router-based HDFS Federation specific parameters

@rem Specify the JVM options to be used when starting the RBF Routers.

@rem These options will be appended to the options specified as HADOOP_OPTS

@rem and therefore may override any similar flags set in HADOOP_OPTS

@rem

@rem set HADOOP_DFSROUTER_OPTS=""

@rem

@rem The directory where pid files are stored. /tmp by default.

@rem NOTE: this should be set to a directory that can only be written to by

@rem the user that will run the hadoop daemons. Otherwise there is the

@rem potential for a symlink attack.

set HADOOP_PID_DIR=%HADOOP_PID_DIR%

set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR%

@rem A string representing this instance of hadoop. %USERNAME% by default.

set HADOOP_IDENT_STRING=%USERNAME%

set HADOOP_PREFIX=D:\study\bigdata\hadoop\hadoop-3.1.0

set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop

set YARN_CONF_DIR=%HADOOP_CONF_DIR%

set PATH=%PATH%;%HADOOP_PREFIX%\bin

core-site.xml

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

hdsf-site.xml

<!--这里是单机版所以是1-->

<!--注意这里的路径写法D前面有斜杠-->

<property>
　　　　

        <name>dfs.replication</name>

        <value>1</value>

    </property>

    <property>

     <name>dfs.permissions</name>

     <value>false</value>

    </property> 
　　

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>/D:/study/bigdata/hadoop/hadoop-3.1.0/data/namenode</value>

    </property>

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>/D:/study/bigdata/hadoop/hadoop-3.1.0/data/datanode</value>

    </property>

mapred-site.xml

<description></description>标签中的内容可以删除

<configuration>

<property>

  <description>CLASSPATH for MR applications. A comma-separated list

  of CLASSPATH entries. If mapreduce.application.framework is set then this

  must specify the appropriate classpath for that archive, and the name of

  the archive must be present in the classpath.

  If mapreduce.app-submission.cross-platform is false, platform-specific

  environment vairable expansion syntax would be used to construct the default

  CLASSPATH entries.

  For Linux:

  $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,

  $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*.

  For Windows:

  %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,

  %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*.

  If mapreduce.app-submission.cross-platform is true, platform-agnostic default

  CLASSPATH for MR applications would be used:

  {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/*,

  {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/lib/*

  Parameter expansion marker will be replaced by NodeManager on container

  launch based on the underlying OS accordingly.

  </description>

   <name>mapreduce.application.classpath</name>

   <value>/D:\study\bigdata\hadoop\hadoop-3.1.0/share/hadoop/mapreduce/*, /D:\study\bigdata\hadoop\hadoop-3.1.0/share/hadoop/mapreduce/lib/*</value>

</property>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

yarn-site.xml

<configuration>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

    </property>

</configuration>

5、winutils相关，hadoop在windows上运行需要winutils支持和hadoop.dll等文件

注意

　　a、hadoop.dll等文件不要与hadoop冲突。为了不出现依赖性错误可以将hadoop.dll放到c:/windows/System32下一份。

　　b、这是什么版本就是找什么版本的，不然会出现各种未知问题

下面是3.1.0的下载地址

https://download.csdn.net/download/cntpro/10497884#comment

下下来之后替换到D:\study\bigdata\hadoop\hadoop-3.1.0\bin里面的内容

6、到D:\study\bigdata\hadoop\hadoop-3.1.0\etc\hadoop路径下用管理员执行hadoop-env.cmd初始化环境（路径是自己的解压路径）

也可以用cmd窗口执行

7、到D:\study\bigdata\hadoop\hadoop-3.1.0\bin路径下执行

hdfs namenode -format

注意这里只执行一次，我因为之前没配好，出错了，所以这里执行了多次，导致后面datanode启动老是报各种各样的错，我是删掉文件，重新解压配了一遍才成功的

8、到D:\study\bigdata\hadoop\hadoop-3.1.0\sbin路径下执行

start-dfs.cmd启动dfs

start-all.cmd是启动全部程序（推荐）

9、启动之后没报错就是成功了

使用jps验证是否成功，下面就是成功了

D:\study\bigdata\hadoop\hadoop-3.1.0\sbin>jps
16032 NameNode
15956 ResourceManager
16996 NodeManager
17268 DataNode
19160 Jps

10、通过http://127.0.0.1:8088/即可查看集群所有节点状态

　　访问http://localhost:9870/即可查看文件管理页面：

11、后续操作自行百度

码农公寓

hadoop3.1.0 window win7 基础环境搭建

相关文章