SQOOP安装配置
1、下载和安装
(1)下载 http://archive.apache.org/dist/sqoop/1.4.7/
wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
(2)上传sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz到/tools目录下
(3)解压安装, 改名
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/pkg/
cd /opt/pkg/
mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
(4)配置环境变量
#sqoop
export SQOOP_HOME=/opt/pkg/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
2、配置
(1)配置sqoop的环境和配置
把Sqoop可能使用的环境变量都配置上
修改sqoop-env.sh:
$ mv sqoop-env-template.sh sqoop-env.sh
$ vi sqoop-env.sh
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/pkg/hadoop-2.7.3
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/pkg/hadoop-2.7.3
#set the path to where bin/hbase is available
export HBASE_HOME=/opt/pkg/hbase-1.2.4
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/pkg/hive
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/pkg/zookeeper-3.4.14/conf
(2)修改sqoop-site.xml
具体配置如下文件所示:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>true</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true</value>
<description>The connect string to use when connecting to a
job-management metastore. If unspecified, uses ~/.sqoop/.
You can specify a different path here.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>SA</value>
<description>The username to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value></value>
<description>The password to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.server.location</name>
<value>/tmp/sqoop-metastore/shared.db</value>
<description>Path to the shared metastore database files.
If this is not set, it will be placed in ~/.sqoop/.
</description>
</property>
<property>
<name>sqoop.metastore.server.port</name>
<value>16000</value>
<description>Port that this metastore should listen on.
</description>
</property>
</configuration>
(3)修改configure-sqoop
vi bin/configure-sqoop
将出现HCAT_HOME和ACCUMULO_HOME的判断逻辑注释掉.
[hadoop@hadoop100 sqoop-1.4.7]$ vi bin/configure-sqoop
82 #if [ -z "${HCAT_HOME}" ]; then
83 # if [ -d "/usr/lib/hive-hcatalog" ]; then
84 # HCAT_HOME=/usr/lib/hive-hcatalog
85 # elif [ -d "/usr/lib/hcatalog" ]; then
86 # HCAT_HOME=/usr/lib/hcatalog
87 # else
88 # HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
89 # if [ ! -d ${HCAT_HOME} ]; then
90 # HCAT_HOME=${SQOOP_HOME}/../hcatalog
91 # fi
92 # fi
93 #fi
94 #if [ -z "${ACCUMULO_HOME}" ]; then
95 # if [ -d "/usr/lib/accumulo" ]; then
96 # ACCUMULO_HOME=/usr/lib/accumulo
97 # else
98 # ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
99 # fi
100 #fi
134 ## Moved to be a runtime check in sqoop.
135 #if [ ! -d "${HCAT_HOME}" ]; then
136 # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
137 # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
138 #fi
139
140 #if [ ! -d "${ACCUMULO_HOME}" ]; then
141 # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
142 # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
143 #fi
这样做的目的是避免运行时出现下面的警告信息,其实也可以不用注释掉:
HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/module/sqoop/bin/…/…/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. ```
(4)添加所需的Jar包
将MySQL的驱动(使用5.x版本,不要使用高版本的)上传到sqoop安装目录下的lib目录下
$ cp mysql-connector-java-5.1.44-bin.jar /opt/pkg/sqoop-1.4.7/lib/
.将$HIVE_HOME/lib/hive-common-3.1.2.jar
拷贝或者软链接到$SQOOP_HOME/lib
下
$ ln -s /opt/pkg/hive/lib/hive-common-3.1.2.jar /opt/pkg/sqoop/lib
如果需要解析json,可下载java-json.har,放到sqoop目录下的lib里
$ cp java-json.jar /opt/pkg/sqoop-1.4.7/lib/
3、验证
sqoop version
4.异常处理
(1)sqoop导入hive表报错解决
Could not load org.apache.hadoop.hive.conf.HiveConf
解决方法:
只需把hive安装文件夹里的/lib/hive-common-x.x.x.jar复制出来放在sqoop安装文件夹的lib文件夹里就行的了
注:上面的x.x.x意思不是这个jar包真的叫hive-common-x.x.x.jar,是表示版本号的,比如说我的hive版本是2.3.3的,所以我这个jar包名是hive-common-2.3.3.jar的。
(2)查询导入报错
INFO mapreduce.ImportJobBase: Beginning query import.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
根据错误定位导缺少java-json jar包
然后去下载这个jar
http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm
把jar包拷贝到SQOOP_HOME/lib下面,然后重新运行SQOOP