Hive Schema Tool元数据运维
Hive Schema存在的问题
较早的Hive版本,不会在MetaStore中写入版本号。所以升级到新版本之后,会报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
在日志中会提示以下信息:
Caused by: MetaException(message:Version information not found in metastore. )
这种情况,可以在较早的Hive中设置hive.metastore.schema.verification=true,来开启版本号的写入。
但在版本升级时遇到了这种情况,就需要使用Hive Schema Tool来解决了。
什么是Hive Schema Tool
Hive提供Hive Schema Tool用于MetaSore Schema的运维修复、升级。
$ schematool -help
usage: schemaTool
-dbType <databaseType> Metastore database type
-driver <driver> Driver name for connection
-dryRun List SQL scripts (no execute)
-help Print this message
-info Show config and schema details
-initSchema Schema initialization
-initSchemaTo <initTo> Schema initialization to a version
-metaDbType <metaDatabaseType> Used only if upgrading the system catalog for hive
-passWord <password> Override config file password
-upgradeSchema Schema upgrade
-upgradeSchemaFrom <upgradeFrom> Schema upgrade from a version
-url <url> Connection url to the database
-userName <user> Override config file user name
-verbose Only print SQL statements
(Additional catalog related options added in Hive 3.0.0 (HIVE-19135] release are below.
-createCatalog <catalog> Create catalog with given name
-catalogLocation <location> Location of new catalog, required when adding a catalog
-catalogDescription <description> Description of new catalog
-ifNotExists If passed then it is not an error to create an existing catalog
-moveDatabase <database> Move a database between catalogs. All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
-moveTable <table> Move a table to a different database. Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase
-toCatalog <catalog> Catalog a moving database or table is going to. This is required if you are moving a database or table.
-fromCatalog <catalog> Catalog a moving database or table is coming from. This is required if you are moving a database or table.
-toDatabase <database> Database a moving table is going to. This is required if you are moving a table.
-fromDatabase <database> Database a moving table is coming from. This is required if you are moving a table.
支持derby|mysql|postgres|oracle|mssql这几种dbtype类型。
Hive Schema Tool的使用
以下是Hive Schema Tool的官方使用演示。
-
初始化元数据信息,在数据库derby中生成Shema数据
schematool -dbType derby -initSchema
-
获取元数据Schema信息
schematool -dbType derby -info
-
将元数据Schema信息升级到当前版本,upgradeSchemaFrom参数指定旧的hive版本
schematool -dbType derby -upgradeSchemaFrom 0.10.0
-
将元数据Schema信息升级到当前版本,并查看升级所需要的脚本
schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
-
将hive元数据信息迁移到spark目录中
schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
-
将Hive数据库和表迁移到Spark中
# 在spark中创建对应数据库newdb,用于接收hive迁移来的数据库 beeline ... -e "create database if not exists newdb"; # 进行数据库迁移 schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark # 进行表数据迁移 schematool -moveTable table1 -fromCatalog hive -toCatalog spark -fromDatabase db1 -toDatabase newdb
Hive Schema Tool解决Hive元数据问题十分方便,而且还支持数据迁移到Spark,当真是一款运维利器。
结束语
如果有帮助的,记得点赞、关注。在公众号《数舟》中,可以免费获取专栏《数据仓库》配套的视频课程、大数据集群自动安装脚本,并获取进群交流的途径。
我所有的大数据技术内容也会优先发布到公众号中。如果对某些大数据技术有兴趣,但没有充足的时间,在群里提出,我为大家安排分享。
公众号自取: