hive基础-组件介绍

2023-12-31 20:31:10

官方介绍
Hive执行流程图：

【Pratical Hive.pdf】学习笔记，各章节做主线辅以官网资料整理完成。

组件架构

客户端组件
Hive-cli,
JDBC/ODBC
Toad or SQuirreL
HCatalog
元数据管理组件，主要作用如下
官方介绍
• Provides a common schema environment for multiple tools
• Allows for connectors to tools to read data from and write data to Hive’s warehouse
• Lets users share data across tools
• Creates a relational structure to Hadoop data
• Abstracts away the how and where of data storage
• Hides schema and storage changes from users
hiveServer2
接口服务组件
Execution-Engine
MR
执行引擎组件
Tez
执行引擎组件，省略shuffle过程
Tez avoids disk IO by avoiding expensive shuffle and shorts while leveraging more efficient map side joins. Tez also utilizes a costbased optimizer, which helps produce faster execution plans. Combine this with the ORC file format geared
toward SQL performance and you have a query engine performing up to 100x faster than native MapReduce–
Hive-on-Spark
Storage: Hadoop
基于hdfs文件存储

码农公寓