1.安装jdk
1)下载jdk-8u65-linux-x64.tar.gz
2)创建/soft文件夹
$>sudo mkdir /soft
$>sudo chown grj:grj /soft
3)tar开
$>tar -xzvf jdk-8u65-linux-x64.tar.gz -C /soft
4)创建符号连接
$>ln -s /soft/jdk-1.8.0_65 /soft/jdk
5)验证jdk安装是否成功
$>cd /soft/jdk/bin
$>./java -version
6)centos配置环境变量
a)编辑/etc/profile
$>sudo nano /etc/profile
...
export JAVA_HOME=/soft/jdk
exprot PATH=$PATH:$JAVA_HOME/bin
b)使环境变量即刻生效
$>source /etc/profile
c)进入任意目录下,测试是否ok
$>cd ~
$>java -version
2.安装hadoop(需要在集群的每一台主机都安装)
1)下载hadoop-2.7.3.tar.gz
2)tar开
$>tar -xzvf hadoop-2.7.3.tar.gz -C /soft
3)创建符号连接
$>ln -s /soft/hadoop-2.7.3 /soft/hadoop
4)验证hadoop安装是否成功
$>cd /soft/hadoop/bin
$>./hadoop version
5)配置hadoop环境变量
$>sudo nano /etc/profile
...
export JAVA_HOME=/soft/jdk
exprot PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6)生效
$>source /etc/profile
3.集群机器配置(修改主机名)
1)每个主机都需在此文件中按照hosts文件中的主机名进行修改,重启后生效
/etc/hostname
s201
2)/etc/hosts
127.0.0.1 localhost
192.168.24.201 s201
192.168.24.202 s202
192.168.24.203 s203
192.168.24.204 s204
3)每个主机都需在此文件中按照hosts文件中的ip进行修改
/etc/sysconfig/network-scripts/ifcfg-exxxxx
...
IPADDR=..
重启网络服务$>sudo service network restart
4)修改/etc/resolv.conf文件,所有主机都改为相同的nameserver
nameserver 192.168.24.2
4.准备完全分布式主机的ssh(实现用户在s201无密登录其它属于该集群的主机)
1)在s201主机上生成密钥对
$>ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
2)将s201的公钥文件id_rsa.pub远程复制到202 ~ 204主机上。
并放置/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub centos@s201:/home/grj/.ssh/authorized_keys
$>scp id_rsa.pub centos@s202:/home/grj/.ssh/authorized_keys
$>scp id_rsa.pub centos@s203:/home/grj/.ssh/authorized_keys
$>scp id_rsa.pub centos@s204:/home/grj/.ssh/authorized_keys
5.配置完全分布式
1)创建配置目录(此时可以把原来的配置文件夹/soft/hadoop/etc/hadoop删除或重命名,防止后续符号链接与其重名)
$>cp -r /soft/hadoop/etc/hadoop /soft/hadoop/etc/full
2)创建符号连接
$>ln -s /soft/hadoop/etc/full hadoop
3)修改配置文件(${hadoop_home}/etc/full/)
[core-site.xml]
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://s201/</value>
</property>
</configuration>
[hdfs-site.xml]
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
[mapred-site.xml]
不变
[yarn-site.xml]
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s201</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
4)修改slaves文件(该文件存放所有的数据节点主机名)
[/soft/hadoop/etc/full/slaves]
s202
s203
s204
5)修改Hadoop环境变量文件[/soft/hadoop/etc/full/hadoop-env.sh]
...
export JAVA_HOME=/soft/jdk
...
6)分发配置
$>cd /soft/hadoop/
$>scp -r etc centos@s202:/soft/hadoop
$>scp -r etc centos@s203:/soft/hadoop
$>scp -r etc centos@s204:/soft/hadoop
7)格式化文件系统(在名称节点s201执行此操作)
$>hadoop namenode -format
8)启动hadoop进程
$>start-all.sh