p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-size: 10.5000pt }
h1 { margin-top: 17.0000pt; margin-bottom: 16.5000pt; page-break-after: avoid; text-align: justify; line-height: 240%; font-family: Calibri; font-weight: bold; font-size: 22.0000pt }
h2 { margin-top: 13.0000pt; margin-bottom: 13.0000pt; page-break-after: avoid; text-align: justify; line-height: 172%; font-family: Arial; font-weight: bold; font-size: 16.0000pt }
p.MsoFooter { margin: 0pt; margin-bottom: .0001pt; text-align: left; font-family: Calibri; font-size: 9.0000pt }
p.pre { margin: 0pt; margin-bottom: .0001pt; text-align: left; font-family: 宋体; font-size: 12.0000pt }
span.msoIns { text-decoration: underline; color: blue }
span.msoDel { text-decoration: line-through; color: red }
table.MsoNormalTable { font-family: "Times New Roman"; font-size: 10.0000pt }
table.MsoTableGrid { text-align: justify; font-family: "Times New Roman"; font-size: 10.0000pt }
div.Section0 { }
p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-size: 10.5000pt }
p.MsoFooter { margin: 0pt; margin-bottom: .0001pt; text-align: left; font-family: Calibri; font-size: 9.0000pt }
p.MsoToc2 { margin-left: 21.0000pt; text-align: justify; font-family: Calibri; font-size: 10.5000pt }
p.MsoToc1 { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-size: 10.5000pt }
span.msoIns { text-decoration: underline; color: blue }
span.msoDel { text-decoration: line-through; color: red }
div.Section0 { }
目录
一、无法关闭 NodeManager和ResourceManager75
关闭重启集群时出现无法关闭 NodeManager和ResourceManager75
第一章hadoop集群搭建
各台机器集群配置状况分布设置
集群配置状况
NameNode |
DFSZKFailoverController |
Zookeeper |
DataNode |
JournalNode |
ResourceManager |
Hbase |
Spark |
Hive |
Mysql |
|
sp-01 |
√ |
√ |
HMaster |
√ |
√ |
|||||
sp-02 |
√ |
√ |
HRegionServer |
√ |
√ |
|||||
sp-03 |
√ |
√ |
√ |
HRegionServer |
√ |
|||||
sp-04 |
√ |
√ |
√ |
√ |
√ |
|||||
sp-05 |
√ |
√ |
√ |
√ |
√ |
|||||
sp-06 |
√ |
√ |
√ |
机器名称 |
原有IP |
对应IP地址 |
sp-01 |
192.168.101.121 |
192.168.10.111 |
sp-02 |
192.168.101.122 |
192.168.10.112 |
sp-03 |
192.168.101.123 |
192.168.10.113 |
sp-04 |
192.168.101.124 |
192.168.10.114 |
sp-05 |
192.168.101.125 |
192.168.10.115 |
sp-06 |
192.168.101.126 |
192.168.10.116 |
一、关于免密码登录操作
主要操作步骤:
主节点执行命令生成密钥:ssh-keygen -t rsa -P ""
2、进入文件夹cd .ssh (进入文件夹后可以执行ls -a 查看文件)
3、将生成的公钥id_rsa.pub 内容追加到authorized_keys(执行命令:cat id_rsa.pub >> authorized_keys)
从节点配置
1、以同样的方式生成秘钥(ssh-keygen -t rsa -P "" ),然后sp-02、sp-03、sp-04、sp-05和sp-06将生成的id_rsa.pub公钥追加到sp-01的authorized_keys中)
执行命令scp id_rsa.pub sp-01:/home/hadoop/.ssh/id_rsa.pub.s1
(ps:id_rsa.pub.s1可视情况定义sn;以下相同 只以s1为例)
2、进入m1执行命令:cat id_rsa.pub.s1 >> authorized_keys
3、最后将生成的包含三个节点的秘钥的authorized_keys 复制到sp-02、sp-03、sp-04、sp-05和sp-06的.ssh目录下
scp authorized_keys sp-02:/home/hadoop/.ssh/
测试:ssh 主机名 例:ssh sp-02
二、免密码登录遇到的问题:
做完以上步骤时可能会无法互相免密码登录
解决方案
1、chmod 600 /home/hadoop/.ssh/authorized_keys
2、chmod 700 /home/hadoop/.ssh/
3、service sshd restart(ps:注意用户权限问题)
三、hadoop集群搭建
1、下载Hadoop版本:http://mirror.bit.edu.cn/apache/hadoop/common/
下载版本hadoop-2.6.0-src.tar.gz
2、将hadoop-2.6.0-src.tar.gz文件上传至集群的/home/hadoop/hadoopInstallFile 文件夹下
并进行解压到路径/home/hadoop/hadoopInstallPath/下
tar -zxvf /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz -C /home/hadoop/hadoopInstallPath/
3、进入hadoop开始配置文件
cd hadoop-2.6.0/etc/hadoop/
4、开始配置文件 vim hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
5、配置文件vim core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://sp-06:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop</value>
</property>
6、配置文件vim hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>sp-05:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>sp-05:50091</value>
</property>
7、创建文件vim masters
8、修改配置文件 vim slaves
sp-01
sp-02
sp-03
sp-04
sp-05
9、配置hadoop环境变量vim .bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
10、立即加载重启 配置的文件
source .bash_profile
11、输入hdfs命令测试
12、将hadoop-2.6.0.tar.gz文件复制到集群的每一台机器上
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-02:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-03:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-04:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-05:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz sp-06:/home/hadoop/hadoopInstallFile/
13、将sp-02、sp-03、sp-04、sp-05、sp-06进行解压tar -zxvf /home/hadoop/hadoopInstallFile/hadoop-2.6.0.tar.gz -C /home/hadoop/hadoopInstallPath/
14、将sp-02、sp-03、sp-04、sp-05、sp-06机器/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
路径中的配置文件与sp-01中/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
配置文件统一
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-02:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-03:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-04:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-05:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-06:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
15、统一各个机器中hadoop环境变量配置
vim .bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source .bash_profile
16、格式化配置NameNode
hdfs namenode -forma
17、在主节点sp-06启动集群
start-dfs.sh
18、在界面访问集群
访问NameNode (IP+端口号)
http://sp-06:50070/
访问SecondNameNode (IP+端口号)
http://sp-05:50090/
19、停止hdfs
stop-dfs.sh
三、配置高可用resourceManager+yarn
3.1修改配置文件vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://tztd</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>sp-06:2181,sp-05:2181,sp-04:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopTmp</value>
</property>
</configuration>
3.2修改配置文件vim hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为tztd,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>tztd</value>
</property>
<!-- tztd下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.tztd</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.tztd.nn1</name>
<value>sp-06:8020</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.tztd.nn2</name>
<value>sp-05:8020</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.tztd.nn1</name>
<value>sp-06:50070</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.tztd.nn2</name>
<value>sp-05:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://sp-05:8485;sp-03:8485;sp-04:8485/tztd</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.tztd</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免密码登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoopTmp/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
3.2修改文件vim mapred-site.xml.template
mv mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
3.3修改配置文件yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--开启resourcemanagerHA,默认为false--> <property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<!--配置resourcemanager--> <property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>sp-03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>sp-04</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>sp-06:2181,sp-05:2181,sp-04:2181</value>
</property>
<!--开启日志整理-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--配置日志整理位置-->
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/hadoopTmp/logs</value>
</property>
</configuration>
Ps:若没有日志这两个配置yarn的日志会产生到/tmp目录下,在此次安装因权限限制不是toot用户所以写不到根目录下面的文件夹,因此需要指定位置
查看工作日志的命令为yarn logs -applicationId (jobId)例如:
yarn logs -applicationId application_1477992062510_0015
3.4将更新过后的文件同步到集群的每一台机器上的/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/目录下
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-05:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-04:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-03:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-02:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
scp /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/* sp-01:/home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/
四、安装zookeeper
解压文件 tar zxvf /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz -C /home/hadoop/hadoopInstallPath/
4.1进入cd /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
复制文档cp zoo_sample.cfg zoo.cfg
编辑文档 vim zoo.cfg
dataDir=/home/hadoop/hadoopTmp/zookeeper
server.1=sp-06:2888:3888
server.2=sp-05:2888:3888
server.3=sp-04:2888:3888
4.2分别在sp-06、sp-05、sp-04在dataDir下创建myid文件对应写入1、2、3
4.3 将sp-06机器下/home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz 别发送到sp-05与sp-04 /home/hadoop/hadoopInstallFile/路径下解压安装
scp /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz sp-04:/home/hadoop/hadoopInstallFile/
tar zxvf /home/hadoop/hadoopInstallFile/zookeeper-3.4.6.tar.gz -C /home/hadoop/hadoopInstallPath/
4.4将sp-06机器下/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/配置文件 别发送到sp-05与sp-04 /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/下
scp /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/* sp-05:/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
scp /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/* sp-04:/home/hadoop/hadoopInstallPath/zookeeper-3.4.6/conf/
4.5启动zookeeper
4.5.1删除sp-06、sp-05 节点/home/hadoop/hadoopTmp/dfs中的 data 、name、namesecondary文件夹
4.5.2删除sp-06、sp-05 节点/home/hadoop/hadoopInstallPath/hadoop-2.6.0/logs/下的所有文件
4.5.3删除sp-01、sp-02、sp-03、sp-04中/home/hadoop/hadoopTmp/dfs/data/下的所有文件
4.5.4删除sp-01、sp-02、sp-03、sp-04中/home/hadoop/hadoopInstallPath/hadoop-2.6.0/logs/下的所有文件
4.6启动三台sp-06、sp-05、sp-04 的zookeeper
cd /home/hadoop/hadoopInstallPath/zookeeper-3.4.6/bin
./zkServer.sh start
Ps:关闭命令为./zkServer.sh stop
五、启动hadoop
5.1启动三台sp-03、sp-04、sp-05 的journalnode
cd /home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin/
./hadoop-daemon.sh start journalnode
5.2在其中一台主节点(sp-06或sp-05)上格式化:hdfs namenode -format
5.3把刚刚格式化之后的元数据拷贝到另外一个namenode上
5.3.1启动刚刚格式化的./hadoop-daemon.sh start namenode
5.3.2在没有格式化的namenode上执行:hdfs namenode –bootstrapStandby
5.3.3把格式化NameNode上的/home/hadoop/hadoopTmp/dfs/name/current文件复制到另一台机器上的/home/hadoop/hadoopTmp/dfs/中
scp -r /home/hadoop/hadoopTmp/dfs/name/current/* sp-05:/home/hadoop/hadoopTmp/dfs/
5.3.4启动第二个./hadoop-daemon.sh start namenode
5.4在其中一个namenode上初始化zkfc:hdfs zkfc -formatZK
5.5停止上面节点:stop-dfs.sh
5.6全面启动:start-dfs.sh
5.7可能会出现的DataNode没有同步启动的原因
原因:两次或者两次以上格式化NameNode
需要查看DataNode /home/hadoop/hadoopTmp/dfs/data/current/VERSION中
clusterID是否与主节点相同
主节点路径/home/hadoop/hadoopTmp/dfs/name/current/VERSION
5.8分别在sp-04、sp-03启动resourcemanage
/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./yarn-daemon.sh start resourcemanage
访问sp-04:8088/
sp-03:8088/
5.9分别在每台节点启动Nodemanager
/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./hadoop-daemon.sh start Nodemanager
搭建完成
六、mapreduce测试hadoop集群
执行代码:
import java.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount
{
public static class WordCountMapper
extends Mapper<Object,Text,Text,IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key,Text value,Context context)
throws IOException, InterruptedException {
String[] words = value.toString().split(" ");//words数组用来接收截取文件中的各个字段内容
for (String str: words)
{
word.set(str);
context.write(word,one);
}
}
}
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int total=0;
for (IntWritable val : values){
total++;
}
context.write(key, new IntWritable(total));
}
}
public static void main (String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
执行命令:
hadoop jar WC.jar /a.txt /output
执行过程:
执行结果:
测试完成
七、eclipse联机测试hadoop
7.1下载Hadoop2.6.0-eclipse-plugin
下载地址https://github.com/winghc/hadoop2x-eclipse-plugin
7.2找一个目录,解压hadoop-2.6.0.tar.gz,本人是用D:\hadoop\hadoop-2.6.0(以下用$HADOOP_HOME表示)
7.3添加环境变量
HADOOP_HOME=D:\hadoop\hadoop-2.6.0
HADOOP_PREFIX=D:\hadoop\hadoop-2.6.0
HADOOP_BIN_PATH=%HADOOP_HOME%\bin
在path路径中加入%HADOOP_HOME%\bin
7.4 下载windows64位平台的hadoop2.6插件包(hadoop.dll,winutils.exe)
http://files.cnblogs.com/files/yjmyzz/hadoop2.6%28x64%29V0.2.zip
将winutils.exe复制到$HADOOP_HOME\bin目录,将hadoop.dll复制到%windir%\system32目录 (主要是防止插件报各种莫名错误,比如空对象引用啥的)
7.5配置hadoop-eclipse-plugin插件
7.5.1将下载好的hadoop-eclipse-plugin-2.6.0放入eclipse中plugin文件夹下(hadoop-eclipse-plugin也可换其他版本)
启动eclipse,windows->show view->other
7.5.2window->preferences->hadoop map/reduce 指定电脑上的hadoop根目录(即:$HADOOP_HOME)
7.5.3然后在Map/Reduce Locations 面板中,点击小象图标
添加一个Location
可以看到
可以在文件上右击,选择删除试下,通常第一次是不成功的,会提示一堆东西,大意是权限不足之类,原因是当前的win7登录用户不是虚拟机里hadoop的运行用户,解决办法有很多,比如你可以在win7上新建一个hadoop的管理员用户,然后切换成hadoop登录win7,再使用eclipse开发,但是这样太烦,最简单的办法:
hdfs-site.xml里添加
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
然后在集群里,运行hadoop dfsadmin -safemode leave
(仅限于测试环境涉及到的用户权限问题)
7.6创建WoldCount示例项目
7.6.1WoldCount代码示例
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCJob {
public static void main(String[] args) {
//设置提交用户
System.setProperty("HADOOP_USER_NAME", "root");
//读取 class path下边配置文件
Configuration conf = new Configuration();
//1、本地运行方式
conf.set("fs.defaultFS", "hdfs://sp-06:8020");
conf.set("yarn.resourcemanager.hostname", "sp-04");
try {
//这里创建job对象不要通过new的方式,new的方式是废弃的,这里使用getInstance方式
Job job = Job.getInstance(conf);
//指定程序入口
job.setJarByClass(WCJob.class);
//设置job名称,在resourcemanager界面可以查看这个job的运行情况
job.setJobName("wc job");
//指定map程序
job.setMapperClass(WCMapper.class);
//指定reduce程序
job.setReducerClass(WCReducer.class);
//指定map程序输出的key、value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置输入文件路径,第二个参数使用的是路径未具体到文件,则会读取该路径下的所有文件,注意这里路径具体到hdfs上的某个文件因此会读取对应的文件
FileInputFormat.addInputPath(job, new Path("/user/wc/input/a.txt"));
Path output = new Path("/user/wc/output");
FileSystem fs = FileSystem.get(conf);
if (fs.exists(output)) {
fs.delete(output, true);
}
//文件输出路径
FileOutputFormat.setOutputPath(job, output);
//提交job
Boolean flag = job.waitForCompletion(true);
if (flag) {
System.out.println("job success~~");
}
} catch (Exception e) {
e.printStackTrace();
};
}
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
//map方法是循环调用的,是框架自动调用的,每次读取一行数据进行处理
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] strs = StringUtils.split(line, ' ');
for(String s : strs) {
context.write(new Text(s), new IntWritable(1));
}
}
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text text, Iterable<IntWritable> iterable,
Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable i : iterable) {
sum += i.get();
}
context.write(text, new IntWritable(sum));
}
}
7.6.2上传文件
hello hadoop
hello word
hello nihao
hello scala
hello spark
必须设置
运行主程序Run on Hadoop
注意:输出路径不能存在
结果
7.6、IntelliJ IDEA 15.0.5连接hadoop测试
用以上代码可直接运行程序,但是没有类似hadoop-eclipse-plugin这样的插件,只能在每次测试完后上集群中删除输出目录文件
至此hadoop搭建测试连接完成
第二章Hbase安装测试
一、Hbase下载
Hbase下载地址http://apache.fayea.com/hbase/hbase-1.0.3/
hbase-1.0.3-bin.tar.gz
二、上传jar包到sp-01机器 /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz路径下
解压安装到/home/hadoop/hadoopInstallPath/
tar -zxvf /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz -C /home/hadoop/hadoopInstallPath/
p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-size: 10.5000pt }
h1 { margin-top: 17.0000pt; margin-bottom: 16.5000pt; page-break-after: avoid; text-align: justify; line-height: 240%; font-family: Calibri; font-weight: bold; font-size: 22.0000pt }
h2 { margin-top: 13.0000pt; margin-bottom: 13.0000pt; page-break-after: avoid; text-align: justify; line-height: 172%; font-family: Arial; font-weight: bold; font-size: 16.0000pt }
span.10 { font-family: "Times New Roman" }
span.15 { font-family: "Times New Roman"; color: rgb(106,57,6) }
span.16 { font-family: "Times New Roman"; font-weight: bold }
span.17 { font-family: Arial; font-weight: bold; font-size: 16.0000pt }
p.pre { margin: 0pt; margin-bottom: .0001pt; text-align: left; font-family: 宋体; font-size: 12.0000pt }
p.p { margin-top: 5.0000pt; margin-right: 0.0000pt; margin-bottom: 5.0000pt; margin-left: 0.0000pt; text-align: left; font-family: Calibri; font-size: 12.0000pt }
p.MsoFooter { margin: 0pt; margin-bottom: .0001pt; text-align: left; font-family: Calibri; font-size: 9.0000pt }
span.msoIns { text-decoration: underline; color: blue }
span.msoDel { text-decoration: line-through; color: red }
table.MsoNormalTable { font-family: "Times New Roman"; font-size: 10.0000pt }
table.MsoTableGrid { text-align: justify; font-family: "Times New Roman"; font-size: 10.0000pt }
div.Section0 { }
三、开始配置Hbase
3.1进入配置文件地址
cd /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
配置文件 vim hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
配置vim hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://tztd/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>sp-06,sp-05,sp-04</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hadoopTmp/zookeeper</value>
</property>
</configuration>
配置vim regionservers
vim backup_master
复制hdfs-site.xml文件
cp -a /home/hadoop/hadoopInstallPath/hadoop-2.6.0/etc/hadoop/hdfs-site.xml /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
3.2将文件复制到sp-02、sp-03机器中
scp /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz sp-02:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz sp-03:/home/hadoop/hadoopInstallFile/
分别解压
tar zxvf /home/hadoop/hadoopInstallFile/hbase-1.0.3-bin.tar.gz -C /home/hadoop/hadoopInstallPath/
3.3将sp-01文件/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/下配置好的文件传送到
sp-02、sp-03/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/下
scp /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/* sp-03:/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
scp /home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/* sp-02:/home/hadoop/hadoopInstallPath/hbase-1.0.3/conf/
3.4统一环境变量sp-01、sp-02、sp-03
vim ~/.bash_profile
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HBASE_HOME=/home/hadoop/hadoopInstallPath/hbase-1.0.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
四、启动Hbase
进入cd /home/hadoop/hadoopInstallPath/hbase-1.0.3
./ bin/start-hbase.sh
./bin/hbase shell
第三章Spark集群安装配置
一、下载spark
下载地址:http://spark.apache.org/downloads.html
下载 /spark-1.6.0-bin-hadoop2.6.tgz
二、安装spark
2.1解压文件tar zxvf /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz -C /home/hadoop/hadoopInstallPath/
2.2进入安装目录cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
2.3复制文件并更名
cp spark-env.sh.template spark-env.sh
配置文件 vim spark-env.sh
主要配置
Java环境变量
Spark的master主机地址(spark自带的集群需要配置一个master做资源调度)
Spark的master主机端口(默认即为7077)
Spark的worker的cpu核数
Spark的worker实例数(一个worker就相当于一个进程,设置一个机器实例数)
Spark的worker的内存使用大小(设置一台机器使用内存)
其余五个配置是为了将spark安装到yarn上,此后任务调度就无需开启spark直接运行即可
(使用yarn模式时不需要去start-all.sh)
Ps:
此配置文件没有问题但是根据各台机器可用内存多少应该实际配置,不然会造成程序没错但是就是出不来程序结果尤其在yarn-client模式时
查看内存命令: cat /proc/meminfo | grep MemTotal
切记设置spark运行内存不要过大
2.4复制并更名文件
cp slaves.template slaves
vim slaves
2.5将要配置spark的节点拷贝文件
scp /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-01:/home/hadoop/hadoopInstallFile/
scp /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-02:/home/hadoop/hadoopInstallFile/
/home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz sp-04:/home/hadoop/hadoopInstallFile/
每台机器分别解压文件
tar zxvf /home/hadoop/hadoopInstallFile/spark-1.6.0-bin-hadoop2.6.tgz -C /home/hadoop/hadoopInstallPath/
2.6将要配置spark的节点conf文件统一
/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-01:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-02:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
scp -r /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/* sp-04:/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf/
三、启动spark集群
必须在主节点执行
/home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin
./start-all.sh
注:(1)、启动命令要在master节点上启动,否则可能会导致主节点服务启动不起来(在本人机器上进行过测试,只能启动worker服务)
(2)、该启动命令与hadoop启动命令相同,因此不要配置环境变量,避免命令冲突,只需进入sbin目录下启动即可
客户端访问:
sp-04:8080
Spark集群搭建启动完毕
关闭命令cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin
stop-all.sh
四、测试spark
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/bin/
./spark-shell
Yarn 测试cluster
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 10
控制台无数据输出
在界面 http://sp-04:8088/cluster resourcesmanager查看
Yarn 测试client
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 10
本机测试完成
Spark高可用配置
高可用模式主要用于standalone模式但实际生产环境都是在yarn上运行 本来可以不用配置当时为了防止以后应用配置(ps可以不开启使用)
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/conf
vim spark-env.sh
更改
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=sp-04:2181,sp-05:2181,sp-06:2181"
将每台spark机器同步
现在已经将sp-04台机器作为了主节点
先想要多加一台节点,例如我现在想将sp-03作为另一个主节点现在进行更改
只更改这一台即可
启动服务
(1)、启动zookeeper集群
./zkserver.sh start
(2)、启动spark集群(该命令无法启动master备用节点,需单独启动sp-04)
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin/
./start-all.sh
(3)、启动备用master节点sp-03
cd /home/hadoop/hadoopInstallPath/spark-1.6.0-bin-hadoop2.6/sbin/
./start-master.sh
页面查看:
测试spark高可用
高可用搭建完成
停掉集群(关闭standalone模式)
[hadoop@sp-03 sbin]$ ./stop-all.sh
第四章hive 安装 及配置
(hive选用版本为:apache-hive-1.1.1-bin.tar.gz)
Hive安装在第一个节点上(IP:192.168.101.121),在第二个节点上安装mysql(IP:192.168.101.199)
一、 hive下载
将下载好的安装包放到 /home/hadoop/hadoopInstallFile目录下
二、安装hive
2.1将安装包解压到/home/hadoop/hadoopInstallPath目录下:
[hadoop@sp-01 hadoopInstallFile]$ tar -zxvf apache-hive-1.1.1-bin.tar.gz -C /home/hadoop/hadoopInstallPath
2.2进入到安装目录下的conf目录:
[hadoop@sp-01 hadoopInstallFile]$ cd apache-hive-1.1.1-bin/conf
//配置hive-site.xml
[hadoop@sp-01 conf]$ cp hive-default.xml.template hive-site.xml
//配置 hive-conf.sh 此文件原来没有,只需要创建即可
[hadoop@sp-01 conf]$ vim hive-conf.sh
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HIVE_CONF_DIR=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/conf
//配置hive-env.sh
[hadoop@sp-01 conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@sp-01 conf]$ vim hive-env.sh
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
export HADOOP_USER_CLASSPATH_FIRST=true
export HIVE_CONF_DIR=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/conf
2.3进入到安装目录下的bin目录:
[hadoop@sp-01 bin]$ vim hive-config.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
export HIVE_HOME=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin
export HADOOP_HOME=/home/hadoop/hadoopInstallPath/hadoop-2.6.0
*******************************************************************************
启动hive时报错:
把hive-site.xml中的所有${system:java.io.tmpdir},替换为/home/hadoop/tmp/ 后又报错
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to create log directory /home/hive/tmp/${system:user.name}
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
*******************************************************************************
把hive-site.xml中所有带有${system:java.io.tmpdir}的value值全部替换为/home/hadoop/tmp/【经检查hive-site.xml文件的value值中包含${system:java.io.tmpdir}的值有两部分所以要进行两次替换确保全部替换完】
[hadoop@sp-01 conf]$ vim hive-site.xml
替换命令::%s/${system:java.io.tmpdir}\/${hive.session.id}_resources/\/home\/hadoop\/hadoopInstallPath\/tmp/g #此处替换结果在57行可以看到
:%s/${system:java.io.tmpdir}\/${system:user.name}/\/home\/hadoop\/hadoopInstallPath\/tmp/g #此处替换结果在2721行可以看到
全部替换完毕后重启hive
[hadoop@sp-01 bin]$ ./hive
Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-1.1.1-bin/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-1.1.1-bin/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
启动成功……
备注;hive的数据保存在HDFS上(/user/hive/warehouse),此目录不能删除
2.4将hive的命令所在目录添加到系统参数PATH中
修改profile文件:
[hadoop@sp-01 ~]$ vim .bash_profile
export HIVE_HOME=/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
之后每次启动hive,只要输入“hive”即可。
三、多用户启动hive时报错
原因:是因为采用了derby这个内嵌数据库作为数据库,它不支持多用户同时访问,解决办法就是重新换MySQL作为元数据库,
问题解决:
在第二个节点上安装mysql,用yum的方式来安装MySQL,要在root用户下操作
[root@sp-02 hadoopInstallPath]# yum -y install mysql*
执行完上述命令后,启动MySQL服务报错
[root@sp-02 hadoopInstallPath]# service mysqld start (报错如下)
原因:是没有执行mysql的数据库安装文件
查看启动日志:cat /var/log/mysqld.log 如下
解决方式:
[root@sp-02 hadoopInstallPath]# mysql_install_db 执行完这条命令在启动MySQL
四、hive在mysql中持久化
登录mysql
[root@sp-02 ~]# mysql
mysql> create user 'hive' identified by 'mysql';
mysql> grant all on *.* to hive@'sp-02' identified by 'unioncast.cn';
mysql> flush privileges;
mysql> quit
Bye
修改配置文件:vim hive-site.xml
[hadoop@sp-01 ~]$cd /home/hadoop/apache-hive-1.1.1-bin/conf
[hadoop@sp-01 conf]$ vim hive-site.xml
修改配置文件中的以下内容
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://sp-02/hive?characterEncoding=UTF-8</value> #此处用的是安装MySQL那个节点的主机名(ip地址也可以),后面连接的是在MySQL创建的那个hive数据库
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value> #连接的用户名是hive用户
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql</value> #hive用户的密码是mysql
<description>password to use against metastore database</description>
</property>
进mysql中创建hive数据库
mysql> create database hive;
mysql> quit;
Bye
登录hive数据仓库创建个表测试在mysql中是否有映射
hive> create table aa(id int);
mysql> show databases;
mysql> use hive;
mysql> show tables;
mysql> select * from TBLS;
Empty set (0.00 sec)
mysql> select * from TBLS;
连接成功……
小记:做的时候把mysql-connector-java-5.1.5-bin.jar拷贝到/home/hadoop/hadoopInstallPath/apache-hive-1.1.1-bin/lib(此包可以去网上下载上)
五、测试
第五章 常见错误解决方式
一、无法关闭 NodeManager和ResourceManager
关闭重启集群时出现无法关闭 NodeManager和ResourceManager
其原因在于
yarn-daemon.sh 脚本产生的配置文件在/tmp下 而当前用户没有权限在/tmp目录下产生pid文件,即使产生也有很大的可能会被定期清理掉所以需要转移目录地址
主要修改文件是/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin下
vim yarn-daemon.sh hadoop-daemon.sh两个文件(切记应该先关闭在修改当前关闭使用kill-9 任务编号关闭或者kill 任务编号 任务编号使用jps命令查询)
vim yarn-daemon.sh
YARN_PID_DIR=/home/hadoop/hadoopTmp/Tmp
在/home/hadoop/hadoopTmp/目录下创建Tmp文件夹
mkdir Tmp
更改后
vim hadoop-daemon.sh
YARN_PID_DIR=/home/hadoop/hadoopTmp/Tmp
更改后
与集群中每一台机器同步
重新启动hadoop集群
start-all.sh
启动yarn
cd /home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./start-yarn.sh
指定文件下回生成这几个pid文件
二、系统出现两个standby
系统出现两个standby而没有active节点时查看DFSZKFailoverController是否启动。如果没有启动则需要单独启动进入/home/hadoop/hadoopInstallPath/hadoop-2.6.0/sbin
./yarn-daemon.sh start zkfc
想要DFSZKFailoverController自动需要在其中一台NameNode上重新格式化zookeeper hdfs zkfc -formatZK
解决util.NativeCodeLoader:
util.NativeCodeLoader: Unable to load native-hadoop library for your platform
文章地址
http://www.secdoctor.com/html/yyjs/31101.html
2.6.0替换包下载地址http://akamai.bintray.com/73/731a49d122fd009679c50222a9a5a4926d1a26b6?__gda__=exp=1477999546~hmac=c3edc3b5d46ee1b544147165797e33a37df0c6034060f646c374e29ec78cda8d&response-content-disposition=attachment%3Bfilename%3D%22hadoop-native-64-2.6.0.tar%22&response-content-type=application%2Foctet-stream&requestInfo=U2FsdGVkX18Z4F9qc-WicETGP2g0HHM8YwPr_ZaSw7nIT1_inzlCCD3rV4WS71l5CcwKOa9r6oe1-mVB08RN0TeQxPBlIKMP7jsZd0DbLsj3S4dNFIsREUEmR6lcDKaNai_TEy8ToFbAR3GSenbD1A
其根本原因就是版本不兼容问题需要替换
本文章有很多不足之处,希望大家多多指正。欢迎大家转载,请注明出处!码农打字不易,敬请谅解,谢谢! http://www.cnblogs.com/baierfa/p/6689022.html