性能优化注意点:
hadoop-MR层面
local-mode: hive.exec.mode.local.auto=true
paralell-execution: hive.exec.parallel=true
strict-mode: hive.mapred.mode=strict
jvm-reuse: mapred.job.reuse.jvm.num.tasks=-1hive-execute-engine
tez
mr
hive-on-spark-
Storage Format
- ORC
Optimized Row Columnar Format,column based storage - Parquet
根据业务场景
- ORC
数据分桶
数据分区
hive优化
hive基础知识
hive优化汇总
hivesql常用优化方法
hivesql编译优化的过程–美团技术团队
大数据技术与架构参考hive2mysql-udf
org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput
hive -e " add jar /usr/share/cmf/common_jars/mysql-connector-java-5.1.15.jar; add jar /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hive/lib/hive-contrib-1.1.0-cdh5.12.1.jar; CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput'; select dboutput('jdbc:mysql://ip:3306/dm?characterEncoding=UTF-8', 'user','paswd','insert into test_ids_order_stat(day,mode,total,change_order_cnt,cancel_order_cnt)values(?,?,?,?,?)', day,mode,total,change_order_cnt,cancel_order_cnt) from dm.ids_order_stat";