第八章 Centos下单机部署sqoop-1.99.7

Sqoop官网下载地址:http://archive.apache.org/dist/sqoop/

Sqoop是建立Hadoop、Hive环境安装之上的,所以需要Hadoop和Hive的环境搭建,Hive即需要依赖于HDFS又需要依赖YARN。安装好Hadoop后需要进行启动HDFS和YARN。

一、下载安装包

[root@stg-airflow001 ~]$ cd /data/software/
[root@stg-airflow001 /data/software]$ wget http://archive.apache.org/dist/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
[root@stg-airflow001 /data/software]$ ll
总用量 1398724
-rw-r--r-- 1 root root 102436055 7月   6 2020 sqoop-1.99.7-bin-hadoop200.tar.gz

二、解压安装包

[root@stg-airflow001 /data/software]$ tar xf sqoop-1.99.7-bin-hadoop200.tar.gz -C /opt/
[root@stg-airflow001 /data/software]$ cd /opt/
[root@stg-airflow001 /opt]$ ll
总用量 0
drwxr-xr-x  8 root   root   157 8月   6 12:58 sqoop-1.99.7-bin-hadoop200

三、创建软连接

[root@stg-airflow001 /opt]$ ln -s sqoop-1.99.7-bin-hadoop200 sqoop

[root@stg-airflow001 /opt]$ ll
总用量 0
lrwxrwxrwx  1 root   root    26 8月   6 14:03 sqoop -> sqoop-1.99.7-bin-hadoop200
drwxr-xr-x  8 root   root   157 8月   6 12:58 sqoop-1.99.7-bin-hadoop200

四、配置环境变量

[root@stg-airflow001 /opt]$ vim /etc/profile.d/sqoop.sh
export PATH=/opt/sqoop/bin:$PATH

五、刷新环境变量

[root@stg-airflow001 /opt]$ source  /etc/profile.d/sqoop.sh

六、备份配置文件

[root@stg-airflow001 /opt]$ cd /opt/sqoop/conf/

[root@stg-airflow001 /opt/sqoop/conf]$ ll
总用量 28
-rw-r--r-- 1 root root 1584 7月  20 2016 sqoop_bootstrap.properties
-rw-r--r-- 1 root root 9797 8月   6 14:42 sqoop.properties

[root@stg-airflow001 /opt/sqoop/conf]$ cp sqoop.properties sqoop.properties.bak 

[root@stg-airflow001 /opt/sqoop/conf]$ ll
总用量 28
-rw-r--r-- 1 root root 1584 7月  20 2016 sqoop_bootstrap.properties
-rw-r--r-- 1 root root 9797 8月   6 14:42 sqoop.properties
-rw-r--r-- 1 root root 9714 7月  20 2016 sqoop.properties.bak

七、修改配置文件

# 1.方式一:默认连接
[root@stg-airflow001 /opt/sqoop/conf]$ vim sqoop.properties
... ...
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/etc/hadoop/

# 进行全局替换
# 设置日志文件的目录
0,%s/@LOGDIR@/\/data\/hadoop\/sqoop\/logs/g
# 设置数据的存放目录
0,%s/@BASEDIR@/\/data\/hadoop\/sqoop\/bin\/BASEDIR/g
# 2.方式二:连接mysql
[root@stg-airflow001 /opt/sqoop/conf]$ vim sqoop.properties
...  ...
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/etc/hadoop/

# JDBC repository provider configuration
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
org.apache.sqoop.repository.jdbc.url=jdbmysql://161.89.107.66:6301/sqoop?createDatabaseIfNotExist=true
org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver
org.apache.sqoop.repository.jdbc.user=hive
org.apache.sqoop.repository.jdbc.password=hive

# System properties for embedded Derby configuration
org.apache.sqoop.repository.sysprop.derby.stream.error.file=@LOGDIR@/mysqlrepo.log

org.apache.sqoop.jetty.port=12000

#配置第三方jar包位置(没有配置)
org.apache.sqoop.connector.external.loadpath=/usr/local/sqoop-1.99.7/lib

八、上传驱动包

[root@stg-airflow001 ~]$ cd /opt/sqoop/server/lib/
[root@stg-airflow001 /opt/sqoop/server/lib]$ rz

[root@stg-airflow001 /opt/sqoop/server/lib]$ ll
总用量 68656
-rw-r--r-- 1 root root  2415211 8月   6 14:44 mysql-connector-java-8.0.23.jar

九、启动Sqoop

1.初始化Sqoop

[root@stg-airflow001 ~]$ sqoop2-tool upgrade
Setting conf dir: /opt/sqoop/bin/../conf
Sqoop home directory: /opt/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
0    [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

2.sqoop验证

[root@stg-airflow001 ~]$ sqoop2-tool verify
Setting conf dir: /opt/sqoop/bin/../conf
Sqoop home directory: /opt/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
15   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

ps:此步骤后,mysql数据库中会创建数据库名为SQOOP的数据库,用以存储sqoop的metadata
在执行sqoop2-tool verify时报错
2019-05-24T12:15:45,449 ERROR [main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"

在mysql中执行
set global sql_mode ='ANSI_QUOTES';

或者在mysql的配置文档my-default.cnf中添加
sql_mode=ANSI_QUOTES

3.启动sqoop2服务

[root@stg-airflow001 ~]$ sqoop2-server start
Setting conf dir: /opt/sqoop/bin/../conf
Sqoop home directory: /opt/sqoop
Starting the Sqoop2 server...
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
28   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Sqoop2 server started.

十、运行sqoop客户端

1.运行sqoop客户端

[root@stg-airflow001 ~]$ sqoop2-shell 
Setting conf dir: /opt/sqoop/bin/../conf
Sqoop home directory: /opt/sqoop
八月 13, 2021 9:03:52 下午 java.util.prefs.FileSystemPreferences$1 run
信息: Created user preferences directory.
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>

2.设置显示错误信息

sqoop:000> set option --name verbose --value true
Verbose option was changed to true

3.连接sqoop服务端

sqoop:000> set server --host localhost
Server is set successfully

4.验证连接

sqoop:000> show version -all
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
server version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
API versions:
  [v1]

5.创建link对象

# 1.检查sqoop服务已经注册的connectors
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+

sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql-link

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbmysql://node1:3306/ecs
Username: root
Password: ****
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry#

SQL Dialect

Identifier enclose: 备注:定界符(输入的空格)
New link was successfully created with validation status OK and name mysql-link

sqoop:000> show link
+------------+------------------------+---------+
| Name | Connector Name | Enabled |
+------------+------------------------+---------+
| mysql-link | generic-jdbc-connector | true |
+------------+------------------------+---------+

sqoop:000> show link -a
1 link(s) to show: 
link with name mysql-link (Enabled: true, Created by root at 5/24/19 1:22 PM, Updated by root at 5/24/19 1:22 PM)
Using Connector generic-jdbc-connector with name {1}
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbmysql://node1:3306/ecs
Username: root
Password: 
Fetch Size: 
Connection Properties: 
SQL Dialect
Identifier enclose: 

sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs-link

HDFS cluster

URI: hdfs://node1:9820
Conf directory: /home/hadoop-3.2.0/etc/hadoop
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name hdfs-link
sqoop:000>

HDFS 导出到mysql
sqoop:000> create job -f "hdfs-link" -t "mysql-link"
Creating job for links with from name hdfs-link and to name mysql-link
Please fill following values to create new job object
Name: hdfs_mysql_job

Input configuration

Input directory: /user/data
Override null value: 
Null value:

Incremental import

Incremental type: 
0 : NONE
1 : NEW_FILES
Choose: 0
Last imported date:

Database target

Schema name: ecs
Table name: users
Column names: 
There are currently 0 values in the list:
element# 
Staging table: 
Clear stage table:

Throttling resources

Extractors: 
Loaders:

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK and name hdfs_mysql_job

sqoop:000> show job
+----+----------------+----------------------------+-------------------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+----------------+----------------------------+-------------------------------------+---------+
| 1 | hdfs_mysql_job | hdfs-link (hdfs-connector) | mysql-link (generic-jdbc-connector) | true |
+----+----------------+----------------------------+-------------------------------------+---------+

修改job
sqoop:000> update job -n hdfs_mysql_job

启动job
sqoop:000> start job -n hdfs_mysql_job
Submission details
Job Name: hdfs_mysql_job
Server URL: http://node1:12000/sqoop/
Created by: root
Creation date: 2019-05-24 14:27:45 HKT
Lastly updated by: root
External ID: job_1558650816925_0003
http://node1:8088/proxy/application_1558650816925_0003/
2019-05-24 14:27:45 HKT: BOOTING - Progress is not available

查看job状态
sqoop:000> status job -n hdfs_mysql_job
Submission details
Job Name: hdfs_mysql_job
Server URL: http://node1:12000/sqoop/
Created by: root
Creation date: 2019-05-24 14:27:45 HKT
Lastly updated by: root
External ID: job_1558650816925_0003
http://node1:8088/proxy/application_1558650816925_0003/
2019-05-24 14:30:10 HKT: BOOTING - 0.00 %

启动job,并显示执行状态
sqoop:000> start job -n hdfs_mysql_job -s


停止job
sqoop:000> stop job -n hdfs_mysql_job


mysql导入到hadoop
sqoop:000> create job -f "mysql-link" -t "hdfs-link"
Name: mysql_hdfs_job

Database source

Schema name: ecs
Table name: role
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query:

Incremental read

Check column: 
Last value:

Target configuration

Override null value: 
Null value: 
File format: 
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression code 
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom code 
Error message: Can't be null nor empty 
Output directory: /user/data
Append mode:

Throttling resources

Extractors: 
Loaders:

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK and name mysql_hdfs_job
 

出现错误时,开启显示错误详细信息
set option --name verbose --value true


启动job时报错
Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_JDBC_CONNECTOR_0003:Unable to access meta data -
查看错误信息,使用ecs账户时错误信息
Caused by: Exception: java.sql.SQLSyntaxErrorException Message: Unknown table 'role' in information_schema
使用root账户时,错误信息是:
Caused by: Exception: java.sql.SQLSyntaxErrorException Message: Table 'SQOOP.role' doesn't exist
看起来是识别不到设置的数据ecs,默认取的第一个数据库

使用sqoop2从mysql导入到hdfs的测试失败


sqoop2-shell报错
[root@node1 ~]# sqoop2-shell
Setting conf dir: /usr/local/sqoop-1.99.7/bin/../conf
Sqoop home directory: /usr/local/sqoop-1.99.7
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallConstructorSite(CallSiteArray.java:87)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
at org.codehaus.groovy.tools.shell.Groovysh.<clinit>(Groovysh.groovy:54)
at org.apache.sqoop.shell.SqoopShell.main(SqoopShell.java:75)
Caused by: groovy.lang.GroovyRuntimeException: Conflicting module versions. Module [groovy-all is loaded in version 2.4.11 and you are trying to load version 2.4.0
解决方法:找出groovy.jar删除
[root@node1 sqoop-1.99.7]# find . -name 'groo*'

[root@node1 sqoop-1.99.7]# rm -r ./shell/lib/groovy-all-2.4.0.jar
rm: remove regular file ‘./shell/lib/groovy-all-2.4.0.jar’? y
上一篇:linux安装sqoop


下一篇:乱七八糟的BUG