基本环境资源
Hadoop:2.7.X
Hive:2.1.X.bin.tar.gz 版本
Hive:1.x.src.tar.gz 源码版本
第一步:windows 安装Hadoop2.7.x,请参考:
第二步:下载Hive.tar.gz,官网下载地址:http://archive.apache.org/dist/hive
第二步:解压Hive.tar.gz 至指定文件夹目录(C:\hive),配置Hive 全局环境变量。
Hive 全局环境变量:
第三步:Hive 配置文件(C:\hive\apache-hive-2.1.1-bin\conf)
配置文件目录C:\hive\apache-hive-2.1.1-bin\conf\conf有4个默认的配置文件模板拷贝成新的文件名
hive-default.xml.template -----> hive-site.xml
hive-env.sh.template -----> hive-env.sh
hive-exec-log4j.properties.template -----> hive-exec-log4j2.properties
hive-log4j.properties.template -----> hive-log4j2.properties
第四步: 新建本地目录后面配置文件用到
C:\hive\apache-hive-2.1.1-bin\my_hive
第五步:Hive需要调整的配置文件(hive-site.xml 和hive-env.sh)
编辑C:\hive\apache-hive-2.1.1-bin\conf\conf\hive-site.xml 文件
- <!--hive的临时数据目录,指定的位置在hdfs上的目录-->
- <property>
- <name>hive.metastore.warehouse.dir </name>
- <value>/user/hive/warehouse </value>
- <description>location of default database for the warehouse </description>
- </property>
- <!--hive的临时数据目录,指定的位置在hdfs上的目录-->
- <property>
- <name>hive.exec.scratchdir </name>
- <value>/tmp/hive </value>
- <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/ <username > is created, with ${hive.scratch.dir.permission}. </description>
- </property>
- <!-- scratchdir 本地目录 -->
- <property>
- <name>hive.exec.local.scratchdir </name>
- <value>C:/hive/apache-hive-2.1.1-bin/my_hive/scratch_dir </value>
- <description>Local scratch space for Hive jobs </description>
- </property>
- <!-- resources_dir 本地目录 -->
- <property>
- <name>hive.downloaded.resources.dir </name>
- <value>C:/hive/apache-hive-2.1.1-bin/my_hive/resources_dir/${hive.session.id}_resources </value>
- <description>Temporary local directory for added resources in the remote file system. </description>
- </property>
- <!-- querylog 本地目录 -->
- <property>
- <name>hive.querylog.location </name>
- <value>C:/hive/apache-hive-2.1.1-bin/my_hive/querylog_dir </value>
- <description>Location of Hive run time structured log file </description>
- </property>
- <!-- operation_logs 本地目录 -->
- <property>
- <name>hive.server2.logging.operation.log.location </name>
- <value>C:/hive/apache-hive-2.1.1-bin/my_hive/operation_logs_dir </value>
- <description>Top level directory where operation logs are stored if logging functionality is enabled </description>
- </property>
- <!-- 数据库连接地址配置 -->
- <property>
- <name>javax.jdo.option.ConnectionURL </name>
- <value>jdbc:mysql://192.168.60.178:3306/hive?serverTimezone=UTC &useSSL=false &allowPublicKeyRetrieval=true </value>
- <description>
- JDBC connect string for a JDBC metastore.
- </description>
- </property>
- <!-- 数据库驱动配置 -->
- <property>
- <name>javax.jdo.option.ConnectionDriverName </name>
- <value>com.mysql.cj.jdbc.Driver </value>
- <description>Driver class name for a JDBC metastore </description>
- </property>
- <!-- 数据库用户名 -->
- <property>
- <name>javax.jdo.option.ConnectionUserName </name>
- <value>admini </value>
- <description>Username to use against metastore database </description>
- </property>
- <!-- 数据库访问密码 -->
- <property>
- <name>javax.jdo.option.ConnectionPassword </name>
- <value>123456 </value>
- <description>password to use against metastore database </description>
- </property>
- <!-- 解决 Caused by: MetaException(message:Version information not found in metastore. ) -->
- <property>
- <name>hive.metastore.schema.verification </name>
- <value>false </value>
- <description>
- Enforce metastore schema version consistency.
- True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
- schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
- proper metastore schema migration. (Default)
- False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
- </description>
- </property>
- <!-- 自动创建全部 -->
- <!-- hive Required table missing : "DBS" in Catalog""Schema" 错误 -->
- <property>
- <name>datanucleus.schema.autoCreateAll </name>
- <value>true </value>
- <description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead. </description>
- </property>
编辑(C:\hive\apache-hive-2.1.1-bin\conf\conf\hive-env.sh 文件)
- # Set HADOOP_HOME to point to a specific hadoop install directory
- export HADOOP_HOME= C:\hadoop\hadoop- 2.7. 6
- # Hive Configuration Directory can be controlled by:
- export HIVE_CONF_DIR= C:\hive\apache-hive- 2.1. 1-bin\conf
- # Folder containing extra libraries required for hive compilation/execution can be controlled by:
- export HIVE_AUX_JARS_PATH= C:\hive\apache-hive- 2.1. 1-bin\ lib
第六步:在hadoop上创建hdfs目录
- hadoop fs -mkdir /tmp
- hadoop fs -mkdir /user/
- hadoop fs -mkdir /user/hive/
- hadoop fs -mkdir /user/hive/warehouse
- hadoop fs -chmod g+w /tmp
- hadoop fs -chmod g+w /user/hive/warehouse
第七步:创建Hive 初始化依赖的数据库hive,注意编码格式:latin1
第八步:启动Hive 服务
(1)、首先启动Hadoop,执行指令:stall-all.cmd
(2)、Hive 初始化数据,执行指令:hive --service metastore
如果一切正常,cmd 窗口指令显示如下截图
如果Hive 初始化正常,MySQL中Hive 数据库涉及表,如下截图:
(3)、启动Hive服务,执行指令:hive
至此,windows 10 搭建Hive 服务结束。
遇到的问题(1):Hive 执行数据初始化(hive --service metastore),总是报错。
解决思路:通过Hive 自身携带的脚本,完成Hive 数据库的初始化。
Hive 携带脚本的文件位置(C:\hive\apache-hive-2.1.1-bin\scripts\metastore\upgrade),选择执行SQL的版本,如下截图:
选择需要执行的Hive版本(Hive_x.x.x)所对应的sql 版本(hive-schema-x.x.x.mysql.sql)
说明:我选择Hive版本时2.1.1,所以我选项的对应sql 版本hive-schema-2..1.0.mysql.sql 脚本。
遇到的问题(2):Hive 的Hive_x.x.x_bin.tar.gz 版本在windows 环境中缺少 Hive的执行文件和运行程序。
解决版本:下载低版本Hive(apache-hive-1.0.0-src),将bin 目录替换目标对象(C:\hive\apache-hive-2.1.1-bin)原有的bin目录。
截图如下:apache-hive-1.0.0-src\bin 目录 结构