应用场景
在研究hadoop的过程中,当然需要部署hadoop集群,如果想要在本地简单试用hadoop,并且没有那么多服务器供你使用,那么伪分布式hadoop环境绝对是你最好的选择。
操作步骤
1. 安装JDK
1.1 查看是否安装了openjdk
# java -version
openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
1.2 查看openjdk源
# rpm -qa | grep java
java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
tzdata-java-2015g-1.el7.noarch
python-javapackages-3.4.1-11.el7.noarch
javapackages-tools-3.4.1-11.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
1.3 依次删除openjdk
# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
# rpm -e --nodeps tzdata-java-2015g-1.el7.noarch
# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
# rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
1.4 重新下载jdk
将下载JDK的后缀为.tar.gz,上传到linux上,解压缩至/opt路径下
jdk下载地址
1.5 配置JDK环境变量
# vim /etc/profile
JAVA_HOME=/opt/jdk1.7.0_79
JRE_HOME=/opt/jdk1.7.0_79/jre
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
PATH=$JAVA_HOME/bin:$PATH
1.6 使变量生效
# source /etc/profile
2. SSH免秘钥登录
2.1 正常登陆,节点跳转ssh,需要输入用户名密码,每次都需要输入,很麻烦,需要设置成免密码登录
# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 7b:10:e3:b5:ea:7d:29:be:77:83:1c:c0:1d:85:de:ba.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[root@localhost's password:
Last login: Sat Apr 2 22:32:44 2016
2.2 配置免密码登录
# cd ~/.ssh/ #若没有该目录,请先执行一次
# ssh localhost
# ssh-keygen -t rsa #会有提示,都按回车就可以
# cat id_rsa.pub >> authorized_keys
# chmod 600 ./authorized_keys # 加入授权
2.3 再次登录,可免秘钥
# ssh localhost
Last login: Sat Apr 2 22:51:41 2016 from localhost
3. 安装Hadoop
3.1 解压Hadoop至/opt路径下
3.2 配置Hadoop环境变量
# vim /etc/profile
export JAVA_HOME=/opt/jdk1.7.0_79
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_PREFIX=/opt/hadoop-2.6.0
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3.3 使变量生效
# source /etc/profile
3.4 修改hadoop-env.sh
# cd /opt/hadoop-2.6.0 # 进入hadoop目录,修改hadoop-env.sh,添加JAVA_HOME路径
# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_79
# bin/hadoop # 执行hadoop指令,测试
3.5 配置HDFS
3.5.1 编辑core-site.xml
# vim /opt/hadoop-2.6.0/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop-2.6.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.208.110:9000</value>
</property>
</configuration>
3.5.2 编辑hdfs-site.xml
# vim /opt/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop-2.6.0/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop-2.6.0/tmp/dfs/data</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
3.5.3 格式化
[root@hadoop hadoop-2.6.0]# hdfs namenode -format
省略N行
16/04/02 22:54:15 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bogon/221.192.153.42
************************************************************/
3.5.4 开启HDFS
# start-dfs.sh
访问http://localhost:50070
3.5.5 HDFS简单使用案例
# hdfs dfs -mkdir /user
# hdfs dfs -mkdir /user/lei
# hdfs dfs -put etc/hadoop input # 如果出现没有input错误
put: `input': No such file or directory
# bin/hadoop fs -mkdir -p input # 手动创建
# hdfs dfs -put etc/hadoop input
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
# hdfs dfs -ls / #查看文件
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 2 items
drwxr-xr-x - root supergroup 0 2016-04-02 23:39 input
drwxr-xr-x - root supergroup 0 2016-04-02 23:43 output
3.6 YARN配置
3.6.1 配置mapred-site.xml
# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
# vim /opt/hadoop-2.6.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.208.110:10020</value>
</property>
</configuration>
3.6.2 配置yarn-site.xml
# vim /opt/hadoop-2.6.0/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3.6.3 启动YARN
# start-yarn.sh
访问http://localhost:8088