Sqoop环境搭建

SQOOP安装配置

1、下载和安装

(1)下载 http://archive.apache.org/dist/sqoop/1.4.7/

wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

(2)上传sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz到/tools目录下

(3)解压安装, 改名

tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/pkg/
cd /opt/pkg/
mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7

(4)配置环境变量

#sqoop
export SQOOP_HOME=/opt/pkg/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

2、配置

(1)配置sqoop的环境和配置

​ 把Sqoop可能使用的环境变量都配置上

​ 修改sqoop-env.sh:

$ mv sqoop-env-template.sh sqoop-env.sh
$ vi sqoop-env.sh 


# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/pkg/hadoop-2.7.3

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/pkg/hadoop-2.7.3

#set the path to where bin/hbase is available
export HBASE_HOME=/opt/pkg/hbase-1.2.4

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/pkg/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/pkg/zookeeper-3.4.14/conf

(2)修改sqoop-site.xml

​ 具体配置如下文件所示:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>true</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>
    
  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>
    
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>
    
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>
    
  <property>
    <name>sqoop.metastore.client.record.password</name>
    <value>true</value>
    <description>If true, allow saved passwords in the metastore.
    </description>
  </property>
    
  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>16000</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

</configuration>

(3)修改configure-sqoop

vi bin/configure-sqoop

​ 将出现HCAT_HOME和ACCUMULO_HOME的判断逻辑注释掉.

[hadoop@hadoop100 sqoop-1.4.7]$ vi bin/configure-sqoop

 82 #if [ -z "${HCAT_HOME}" ]; then
 83 #  if [ -d "/usr/lib/hive-hcatalog" ]; then
 84 #    HCAT_HOME=/usr/lib/hive-hcatalog
 85 #  elif [ -d "/usr/lib/hcatalog" ]; then
 86 #    HCAT_HOME=/usr/lib/hcatalog
 87 #  else
 88 #    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
 89 #    if [ ! -d ${HCAT_HOME} ]; then
 90 #       HCAT_HOME=${SQOOP_HOME}/../hcatalog
 91 #    fi
 92 #  fi
 93 #fi
 94 #if [ -z "${ACCUMULO_HOME}" ]; then
 95 #  if [ -d "/usr/lib/accumulo" ]; then
 96 #    ACCUMULO_HOME=/usr/lib/accumulo
 97 #  else
 98 #    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
 99 #  fi
100 #fi

134 ## Moved to be a runtime check in sqoop.
135 #if [ ! -d "${HCAT_HOME}" ]; then
136 #  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
137 #  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
138 #fi
139 
140 #if [ ! -d "${ACCUMULO_HOME}" ]; then
141 #  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
142 #  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
143 #fi

这样做的目的是避免运行时出现下面的警告信息,其实也可以不用注释掉:

HCatalog jobs will fail.  Please set $HCAT_HOME to the root of your
HCatalog installation.  Warning: /opt/module/sqoop/bin/…/…/accumulo
does not exist! Accumulo imports will fail.  Please set $ACCUMULO_HOME
to the root of your Accumulo installation.  ```

(4)添加所需的Jar包

​ 将MySQL的驱动(使用5.x版本,不要使用高版本的)上传到sqoop安装目录下的lib目录下

$ cp mysql-connector-java-5.1.44-bin.jar /opt/pkg/sqoop-1.4.7/lib/

.将$HIVE_HOME/lib/hive-common-3.1.2.jar拷贝或者软链接到$SQOOP_HOME/lib

$ ln -s /opt/pkg/hive/lib/hive-common-3.1.2.jar /opt/pkg/sqoop/lib

如果需要解析json,可下载java-json.har,放到sqoop目录下的lib里

$ cp java-json.jar /opt/pkg/sqoop-1.4.7/lib/

3、验证

sqoop version
Sqoop环境搭建

4.异常处理

(1)sqoop导入hive表报错解决

Could not load org.apache.hadoop.hive.conf.HiveConf

​ 解决方法:

​ 只需把hive安装文件夹里的/lib/hive-common-x.x.x.jar复制出来放在sqoop安装文件夹的lib文件夹里就行的了

​ 注:上面的x.x.x意思不是这个jar包真的叫hive-common-x.x.x.jar,是表示版本号的,比如说我的hive版本是2.3.3的,所以我这个jar包名是hive-common-2.3.3.jar的。

(2)查询导入报错

INFO mapreduce.ImportJobBase: Beginning query import.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

根据错误定位导缺少java-json jar包

然后去下载这个jar

http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm

把jar包拷贝到SQOOP_HOME/lib下面,然后重新运行SQOOP

上一篇:HBase数据导入工具总结


下一篇:二、Sqoop常见问题总结