离线数仓(三)

第3章 数仓环境搭建

3.1 Hive环境搭建

3.1.1 Hive引擎简介

  Hive引擎包括:默认MR、tez、spark

  Hive on Spark:Hive既作为存储元数据又负责SQL的解析优化,语法是HQL语法,执行引擎变成了Spark,Spark负责采用RDD执行。

  Spark on Hive : Hive只作为存储元数据,Spark负责SQL解析优化,语法是Spark SQL语法,Spark负责采用RDD执行。

3.1.2 Hive on Spark配置

  1)兼容性说明

    注意:官网下载的Hive3.1.2和Spark3.0.0默认是不兼容的。因为Hive3.1.2支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本

    编译步骤:官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。

  2)在Hive所在节点部署Spark

    如果之前已经部署了Spark,则该步骤可以跳过,但要检查SPARK_HOME的环境变量配置是否正确

echo $SPARK_HOME

离线数仓(三)

    (1)Spark官网下载jar包地址:http://spark.apache.org/downloads.html

    (2)上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz

tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
mv /opt/module/spark-3.0.0-bin-hadoop3.2 /opt/module/spark-yarn

    (3)配置SPARK_HOME环境变量

sudo vim /etc/profile.d/my_env.sh
#SPARK_HOME
export SPARK_HOME=/opt/module/spark-yarn
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile.d/my_env.sh

  3)在hive中创建spark配置文件

vim /opt/module/hive/conf/spark-defaults.conf
spark.master    yarn
spark.eventLog.enabled    true
spark.eventLog.dir    hdfs://hadoop102:8020/spark/hiveOnSparkLog
spark.executor.memory    1g
spark.driver.memory    1g

    在HDFS创建如下路径,用于存储历史日志

hadoop fs -mkdir /spark/hiveOnSparkLog

  4)向HDFS上传Spark纯净版jar包

    说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。

    说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。

    (1)上传并解压spark-3.0.0-bin-without-hadoop.tgz

#解压在当前目录
tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz -C .

    (2)上传Spark纯净版jar包到HDFS

#在HDFS上创建/spark-jars目录
hadoop fs -mkdir /spark-jars
#上传Jar值HDFS
hadoop fs -put /opt/software/spark-3.0.0-bin-without-hadoop/jars/* /spark-jars

  5)修改hive-site.xml文件

vim /opt/module/hive/conf/hive-site.xml
<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
<property>
    <name>spark.yarn.jars</name>
    <value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>
  
<!--Hive执行引擎-->
<property>
    <name>hive.execution.engine</name>
    <value>spark</value>
</property>

<!--Hive和Spark连接超时时间-->
<property>
    <name>hive.spark.client.connect.timeout</name>
    <value>10000ms</value>
</property>

<property>
  <name>mapreduce.job.queuename</name>
  <value>hive</value>
</property>

    注意:hive.spark.client.connect.timeout的默认值是1000ms,如果执行hive的insert语句时,抛如下异常,可以调大该参数到10000ms

FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session d9e0224c-3d14-4bf4-95bc-ee3ec56df48e

3.1.3 Hive on Spark测试

  (1)启动hive客户端

bin/hive

  (2)创建一张测试表

create table student2(id int, name string);

  (3)通过insert测试效果

insert into table student2 values(1,'abc');

  (4)若结果如下,则说明配置成功

离线数仓(三)

3.2 Yarn配置

3.2.1 增加ApplicationMaster资源比例

  容量调度器对每个资源队列中同时运行的Application Master占用的资源进行了限制,该限制通过yarn.scheduler.capacity.maximum-am-resource-percent参数实现,其默认值是0.1,表示每个资源队列上Application Master最多可使用的资源为该队列总资源的10%,目的是防止大部分资源都被Application Master占用,而导致Map/Reduce Task无法执行。

  生产环境该参数可使用默认值。但学习环境,集群资源总数很少,如果只分配10%的资源给Application Master,则可能出现,同一时刻只能运行一个Job的情况,因为一个Application Master使用的资源就可能已经达到10%的上限了。故此处可将该值适当调大。

  (1)在hadoop102的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中修改如下参数值

vim capacity-scheduler.xml
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

   <!-- 容量调度器最多可以容纳多少个job-->
  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <!-- 当前队列中启动的MRAppMaster进程,所占用的资源可以达到队列总资源的多少
        通过这个参数可以限制队列中提交的Job数量
  -->
  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.5</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <!-- 为Job分配资源时,使用什么策略进行计算
  -->
  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare 
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare 
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

   <!-- root队列中有哪些子队列-->
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <!-- root队列中default队列占用的容量百分比
        所有子队列的容量相加必须等于100
  -->
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>30</value>
    <description>Default queue target capacity.</description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>70</value>
    <description>Default queue target capacity.</description>
  </property>

    <!-- 队列中用户能使用此队列资源的极限百分比
  -->
  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  
   <property>
    <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <!-- root队列中default队列占用的容量百分比的最大值
  -->
  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>
  
   <property>
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>

    <!-- root队列中每个队列的状态 -->
  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  
    <property>
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  
  <!-- 限制向default队列提交的用户-->
  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>
  
  <property>
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers. 
      Typically this should be set to number of nodes in the cluster, By default is setting 
      approximately number of nodes in one rack which is 40.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>
</configuration>

  (2)分发capacity-scheduler.xml配置文件

xsync capacity-scheduler.xml

  (3)关闭正在运行的任务,重新启动yarn集群(你的yarn安装在哪一台机器就在哪一台运行命令)

sbin/stop-yarn.sh
sbin/start-yarn.sh

3.3 数仓开发环境

  数仓开发工具可选用DBeaver或者DataGrip。两者都需要用到JDBC协议连接到Hive,故需要启动HiveServer2。

  1.启动HiveServer2

hiveserver2

  2.配置DataGrip连接(DataGrip直接百度下载一个破解版的安装破解即可,这个相信大家懂的都懂)

    1)创建连接

离线数仓(三)

    2)配置连接属性

      所有属性配置,和Hive的beeline客户端配置一致即可。初次使用,配置过程会提示缺少JDBC驱动,按照提示下载即可

离线数仓(三)

  3.测试使用

    1)创建数据库gmall,并观察是否创建成功

离线数仓(三)

    2)修改连接,指明连接数据库

离线数仓(三)

    3)选择当前数据库为gmall

离线数仓(三)

离线数仓(三)

3.4 数据准备(详情请看离线数仓(一))

  一般企业在搭建数仓时,业务系统中会存在一定的历史数据,此处为模拟真实场景,需准备若干历史数据。假定数仓上线的日期为2021-06-08,具体说明如下:

  1.用户行为日志(若HDFS中已经存在该日志数据,则此步骤省略)

    用户行为日志,一般是没有历史数据的,故日志只需要准备2021-06-08一天的数据,具体操作如下:

    1)启动日志采集通道,包括Flume、Kafak等

    2)修改两个日志服务器(hadoop102、hadoop103)中的/opt/module/applog/application.yml配置文件,将mock.date参数改为2021-06-08

    3)执行日志生成脚本lg.sh

    4)观察HDFS是否出现相应文件

离线数仓(三)

  2.业务数据

    业务数据一般存在历史数据,此处需准备2021-06-08至2021-06-10的数据,具体操作如下请看离线数仓(一):https://www.cnblogs.com/LzMingYueShanPao/p/14862980.html

离线数仓(三)

第4章 数仓搭建-ODS

  1)保持数据原貌不做任何修改,起到备份数据的作用。

  2)数据采用LZO压缩,减少磁盘存储空间。100G数据可以压缩到10G以内。

  3)创建分区表,防止后续的全表扫描,在企业开发中大量使用分区表。

  4)创建外部表。在企业开发中,除了自己用的临时表,创建内部表外,绝大多数场景都是创建外部表。

4.1 ODS层(用户行为数据)

4.1.1 创建日志表ods_log

  1)创建支持lzo压缩的分区表

hive (gmall)>

drop table if exists ods_log;

CREATE EXTERNAL TABLE ods_log (`line` string)

PARTITIONED BY (`dt` string) -- 按照时间创建分区

STORED AS -- 指定存储方式,读数据采用LzoTextInputFormat;

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_log'  -- 指定数据在hdfs上的存储位置

;

说明Hive的LZO压缩:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

2)加载数据

hive (gmall)>

load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14');

注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式

3)为lzo压缩文件创建索引

[atguigu@hadoop102 bin]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14

4.1.2 Shell中单引号和双引号区别

1)在/home/atguigu/bin创建一个test.sh文件

[atguigu@hadoop102 bin]$ vim test.sh

在文件中添加如下内容

#!/bin/bash

do_date=$1

 

echo '$do_date'

echo "$do_date"

echo "'$do_date'"

echo '"$do_date"'

echo `date`

2)查看执行结果

[atguigu@hadoop102 bin]$ test.sh 2020-06-14

$do_date

2020-06-14

'2020-06-14'

"$do_date"

2020年 06月 18日 星期四 21:02:08 CST

3)总结:

(1)单引号不取变量值

(2)双引号取变量值

(3)反引号`,执行引号中命令

(4)双引号内部嵌套单引号,取出变量值

(5)单引号内部嵌套双引号,不取出变量值

4.1.3 ODS层日志表加载数据脚本

1)编写脚本

(1)在hadoop102的/home/atguigu/bin目录下创建脚本

[atguigu@hadoop102 bin]$ vim hdfs_to_ods_log.sh

在脚本中编写如下内容

#!/bin/bash

 

# 定义变量方便修改

APP=gmall

 

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n "$1" ] ;then

   do_date=$1

else

   do_date=`date -d "-1 day" +%F`

fi

 

echo ================== 日志日期为 $do_date ==================

sql="

load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date');

"

 

hive -e "$sql"

 

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date

(1)说明1:

[ -n 变量值 ] 判断变量的值,是否为空

-- 变量的值,非空,返回true

-- 变量的值,为空,返回false

注意:[ -n 变量值 ]不会解析数据,使用[ -n 变量值 ]时,需要对变量加上双引号(" ")

(2)说明2:

查看date命令的使用,date --help

(2)增加脚本执行权限

[atguigu@hadoop102 bin]$ chmod 777 hdfs_to_ods_log.sh

2)脚本使用

(1)执行脚本

[atguigu@hadoop102 module]$ hdfs_to_ods_log.sh 2020-06-14

(2)查看导入数据

4.2 ODS层(业务数据)

 

4.2.1 活动信息表

DROP TABLE IF EXISTS ods_activity_info;

CREATE EXTERNAL TABLE ods_activity_info(

    `id` STRING COMMENT '编号',

    `activity_name` STRING  COMMENT '活动名称',

    `activity_type` STRING  COMMENT '活动类型',

    `start_time` STRING  COMMENT '开始时间',

    `end_time` STRING  COMMENT '结束时间',

    `create_time` STRING  COMMENT '创建时间'

) COMMENT '活动信息表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_activity_info/';

4.2.2 活动规则表

DROP TABLE IF EXISTS ods_activity_rule;

CREATE EXTERNAL TABLE ods_activity_rule(

    `id` STRING COMMENT '编号',

    `activity_id` STRING  COMMENT '活动ID',

    `activity_type` STRING COMMENT '活动类型',

    `condition_amount` DECIMAL(16,2) COMMENT '满减金额',

    `condition_num` BIGINT COMMENT '满减件数',

    `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',

    `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',

    `benefit_level` STRING COMMENT '优惠级别'

) COMMENT '活动规则表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_activity_rule/';

4.2.3 一级品类表

DROP TABLE IF EXISTS ods_base_category1;

CREATE EXTERNAL TABLE ods_base_category1(

    `id` STRING COMMENT 'id',

    `name` STRING COMMENT '名称'

) COMMENT '商品一级分类表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_category1/';

4.2.4 二级品类表

DROP TABLE IF EXISTS ods_base_category2;

CREATE EXTERNAL TABLE ods_base_category2(

    `id` STRING COMMENT ' id',

    `name` STRING COMMENT '名称',

    `category1_id` STRING COMMENT '一级品类id'

) COMMENT '商品二级分类表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_category2/';

4.2.5 三级品类表

DROP TABLE IF EXISTS ods_base_category3;

CREATE EXTERNAL TABLE ods_base_category3(

    `id` STRING COMMENT ' id',

    `name` STRING COMMENT '名称',

    `category2_id` STRING COMMENT '二级品类id'

) COMMENT '商品三级分类表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_category3/';

4.2.6 编码字典表

DROP TABLE IF EXISTS ods_base_dic;

CREATE EXTERNAL TABLE ods_base_dic(

    `dic_code` STRING COMMENT '编号',

    `dic_name` STRING COMMENT '编码名称',

    `parent_code` STRING COMMENT '父编码',

    `create_time` STRING COMMENT '创建日期',

    `operate_time` STRING COMMENT '操作日期'

) COMMENT '编码字典表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_dic/';

4.2.7 省份表

DROP TABLE IF EXISTS ods_base_province;

CREATE EXTERNAL TABLE ods_base_province (

    `id` STRING COMMENT '编号',

    `name` STRING COMMENT '省份名称',

    `region_id` STRING COMMENT '地区ID',

    `area_code` STRING COMMENT '地区编码',

    `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',

    `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用'

)  COMMENT '省份表'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_province/';

4.2.8 地区表

DROP TABLE IF EXISTS ods_base_region;

CREATE EXTERNAL TABLE ods_base_region (

    `id` STRING COMMENT '编号',

    `region_name` STRING COMMENT '地区名称'

)  COMMENT '地区表'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_region/';

4.2.9 品牌表

DROP TABLE IF EXISTS ods_base_trademark;

CREATE EXTERNAL TABLE ods_base_trademark (

    `id` STRING COMMENT '编号',

    `tm_name` STRING COMMENT '品牌名称'

)  COMMENT '品牌表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_base_trademark/';

4.2.10 购物车表

DROP TABLE IF EXISTS ods_cart_info;

CREATE EXTERNAL TABLE ods_cart_info(

    `id` STRING COMMENT '编号',

    `user_id` STRING COMMENT '用户id',

    `sku_id` STRING COMMENT 'skuid',

    `cart_price` DECIMAL(16,2)  COMMENT '放入购物车时价格',

    `sku_num` BIGINT COMMENT '数量',

    `sku_name` STRING COMMENT 'sku名称 (冗余)',

    `create_time` STRING COMMENT '创建时间',

    `operate_time` STRING COMMENT '修改时间',

    `is_ordered` STRING COMMENT '是否已经下单',

    `order_time` STRING COMMENT '下单时间',

    `source_type` STRING COMMENT '来源类型',

    `source_id` STRING COMMENT '来源编号'

) COMMENT '加购表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_cart_info/';

4.2.11 评论表

DROP TABLE IF EXISTS ods_comment_info;

CREATE EXTERNAL TABLE ods_comment_info(

    `id` STRING COMMENT '编号',

    `user_id` STRING COMMENT '用户ID',

    `sku_id` STRING COMMENT '商品sku',

    `spu_id` STRING COMMENT '商品spu',

    `order_id` STRING COMMENT '订单ID',

    `appraise` STRING COMMENT '评价',

    `create_time` STRING COMMENT '评价时间'

) COMMENT '商品评论表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_comment_info/';

4.2.12 优惠券信息表

DROP TABLE IF EXISTS ods_coupon_info;

CREATE EXTERNAL TABLE ods_coupon_info(

    `id` STRING COMMENT '购物券编号',

    `coupon_name` STRING COMMENT '购物券名称',

    `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',

    `condition_amount` DECIMAL(16,2) COMMENT '满额数',

    `condition_num` BIGINT COMMENT '满件数',

    `activity_id` STRING COMMENT '活动编号',

    `benefit_amount` DECIMAL(16,2) COMMENT '减金额',

    `benefit_discount` DECIMAL(16,2) COMMENT '折扣',

    `create_time` STRING COMMENT '创建时间',

    `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',

    `limit_num` BIGINT COMMENT '最多领用次数',

    `taken_count` BIGINT COMMENT '已领用次数',

    `start_time` STRING COMMENT '开始领取时间',

    `end_time` STRING COMMENT '结束领取时间',

    `operate_time` STRING COMMENT '修改时间',

    `expire_time` STRING COMMENT '过期时间'

) COMMENT '优惠券表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_coupon_info/';

4.2.13 优惠券领用表

DROP TABLE IF EXISTS ods_coupon_use;

CREATE EXTERNAL TABLE ods_coupon_use(

    `id` STRING COMMENT '编号',

    `coupon_id` STRING  COMMENT '优惠券ID',

    `user_id` STRING  COMMENT 'skuid',

    `order_id` STRING  COMMENT 'spuid',

    `coupon_status` STRING  COMMENT '优惠券状态',

    `get_time` STRING  COMMENT '领取时间',

    `using_time` STRING  COMMENT '使用时间(下单)',

    `used_time` STRING  COMMENT '使用时间(支付)',

    `expire_time` STRING COMMENT '过期时间'

) COMMENT '优惠券领用表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_coupon_use/';

4.2.14 收藏表

DROP TABLE IF EXISTS ods_favor_info;

CREATE EXTERNAL TABLE ods_favor_info(

    `id` STRING COMMENT '编号',

    `user_id` STRING COMMENT '用户id',

    `sku_id` STRING COMMENT 'skuid',

    `spu_id` STRING COMMENT 'spuid',

    `is_cancel` STRING COMMENT '是否取消',

    `create_time` STRING COMMENT '收藏时间',

    `cancel_time` STRING COMMENT '取消时间'

) COMMENT '商品收藏表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_favor_info/';

4.2.15 订单明细表

DROP TABLE IF EXISTS ods_order_detail;

CREATE EXTERNAL TABLE ods_order_detail(

    `id` STRING COMMENT '编号',

    `order_id` STRING  COMMENT '订单号',

    `sku_id` STRING COMMENT '商品id',

    `sku_name` STRING COMMENT '商品名称',

    `order_price` DECIMAL(16,2) COMMENT '商品价格',

    `sku_num` BIGINT COMMENT '商品数量',

    `create_time` STRING COMMENT '创建时间',

    `source_type` STRING COMMENT '来源类型',

    `source_id` STRING COMMENT '来源编号',

    `split_final_amount` DECIMAL(16,2) COMMENT '分摊最终金额',

    `split_activity_amount` DECIMAL(16,2) COMMENT '分摊活动优惠',

    `split_coupon_amount` DECIMAL(16,2) COMMENT '分摊优惠券优惠'

) COMMENT '订单详情表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_detail/';

4.2.16 订单明细活动关联表

DROP TABLE IF EXISTS ods_order_detail_activity;

CREATE EXTERNAL TABLE ods_order_detail_activity(

    `id` STRING COMMENT '编号',

    `order_id` STRING  COMMENT '订单号',

    `order_detail_id` STRING COMMENT '订单明细id',

    `activity_id` STRING COMMENT '活动id',

    `activity_rule_id` STRING COMMENT '活动规则id',

    `sku_id` BIGINT COMMENT '商品id',

    `create_time` STRING COMMENT '创建时间'

) COMMENT '订单详情活动关联表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';

4.2.17 订单明细优惠券关联表

DROP TABLE IF EXISTS ods_order_detail_coupon;

CREATE EXTERNAL TABLE ods_order_detail_coupon(

    `id` STRING COMMENT '编号',

    `order_id` STRING  COMMENT '订单号',

    `order_detail_id` STRING COMMENT '订单明细id',

    `coupon_id` STRING COMMENT '优惠券id',

    `coupon_use_id` STRING COMMENT '优惠券领用记录id',

    `sku_id` STRING COMMENT '商品id',

    `create_time` STRING COMMENT '创建时间'

) COMMENT '订单详情活动关联表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';

4.2.18 订单表

DROP TABLE IF EXISTS ods_order_info;

CREATE EXTERNAL TABLE ods_order_info (

    `id` STRING COMMENT '订单号',

    `final_amount` DECIMAL(16,2) COMMENT '订单最终金额',

    `order_status` STRING COMMENT '订单状态',

    `user_id` STRING COMMENT '用户id',

    `payment_way` STRING COMMENT '支付方式',

    `delivery_address` STRING COMMENT '送货地址',

    `out_trade_no` STRING COMMENT '支付流水号',

    `create_time` STRING COMMENT '创建时间',

    `operate_time` STRING COMMENT '操作时间',

    `expire_time` STRING COMMENT '过期时间',

    `tracking_no` STRING COMMENT '物流单编号',

    `province_id` STRING COMMENT '省份ID',

    `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',

    `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',

    `original_amount` DECIMAL(16,2)  COMMENT '订单原价金额',

    `feight_fee` DECIMAL(16,2)  COMMENT '运费',

    `feight_fee_reduce` DECIMAL(16,2)  COMMENT '运费减免'

) COMMENT '订单表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_info/';

4.2.19 退单表

DROP TABLE IF EXISTS ods_order_refund_info;

CREATE EXTERNAL TABLE ods_order_refund_info(

    `id` STRING COMMENT '编号',

    `user_id` STRING COMMENT '用户ID',

    `order_id` STRING COMMENT '订单ID',

    `sku_id` STRING COMMENT '商品ID',

    `refund_type` STRING COMMENT '退单类型',

    `refund_num` BIGINT COMMENT '退单件数',

    `refund_amount` DECIMAL(16,2) COMMENT '退单金额',

    `refund_reason_type` STRING COMMENT '退单原因类型',

    `refund_status` STRING COMMENT '退单状态',--退单状态应包含买家申请、卖家审核、卖家收货、退款完成等状态。此处未涉及到,故该表按增量处理

    `create_time` STRING COMMENT '退单时间'

) COMMENT '退单表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';

4.2.20 订单状态日志表

DROP TABLE IF EXISTS ods_order_status_log;

CREATE EXTERNAL TABLE ods_order_status_log (

    `id` STRING COMMENT '编号',

    `order_id` STRING COMMENT '订单ID',

    `order_status` STRING COMMENT '订单状态',

    `operate_time` STRING COMMENT '修改时间'

)  COMMENT '订单状态表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_order_status_log/';

4.2.21 支付表

DROP TABLE IF EXISTS ods_payment_info;

CREATE EXTERNAL TABLE ods_payment_info(

    `id` STRING COMMENT '编号',

    `out_trade_no` STRING COMMENT '对外业务编号',

    `order_id` STRING COMMENT '订单编号',

    `user_id` STRING COMMENT '用户编号',

    `payment_type` STRING COMMENT '支付类型',

    `trade_no` STRING COMMENT '交易编号',

    `payment_amount` DECIMAL(16,2) COMMENT '支付金额',

    `subject` STRING COMMENT '交易内容',

    `payment_status` STRING COMMENT '支付状态',

    `create_time` STRING COMMENT '创建时间',

    `callback_time` STRING COMMENT '回调时间'

)  COMMENT '支付流水表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_payment_info/';

4.2.22 退款表

DROP TABLE IF EXISTS ods_refund_payment;

CREATE EXTERNAL TABLE ods_refund_payment(

    `id` STRING COMMENT '编号',

    `out_trade_no` STRING COMMENT '对外业务编号',

    `order_id` STRING COMMENT '订单编号',

    `sku_id` STRING COMMENT 'SKU编号',

    `payment_type` STRING COMMENT '支付类型',

    `trade_no` STRING COMMENT '交易编号',

    `refund_amount` DECIMAL(16,2) COMMENT '支付金额',

    `subject` STRING COMMENT '交易内容',

    `refund_status` STRING COMMENT '支付状态',

    `create_time` STRING COMMENT '创建时间',

    `callback_time` STRING COMMENT '回调时间'

)  COMMENT '支付流水表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_refund_payment/';

4.2.23 商品平台属性表

DROP TABLE IF EXISTS ods_sku_attr_value;

CREATE EXTERNAL TABLE ods_sku_attr_value(

    `id` STRING COMMENT '编号',

    `attr_id` STRING COMMENT '平台属性ID',

    `value_id` STRING COMMENT '平台属性值ID',

    `sku_id` STRING COMMENT '商品ID',

    `attr_name` STRING COMMENT '平台属性名称',

    `value_name` STRING COMMENT '平台属性值名称'

) COMMENT 'sku平台属性表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';

4.2.24 商品(SKU)表

DROP TABLE IF EXISTS ods_sku_info;

CREATE EXTERNAL TABLE ods_sku_info(

    `id` STRING COMMENT 'skuId',

    `spu_id` STRING COMMENT 'spuid',

    `price` DECIMAL(16,2) COMMENT '价格',

    `sku_name` STRING COMMENT '商品名称',

    `sku_desc` STRING COMMENT '商品描述',

    `weight` DECIMAL(16,2) COMMENT '重量',

    `tm_id` STRING COMMENT '品牌id',

    `category3_id` STRING COMMENT '品类id',

    `is_sale` STRING COMMENT '是否在售',

    `create_time` STRING COMMENT '创建时间'

) COMMENT 'SKU商品表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_sku_info/';

4.2.25 商品销售属性表

DROP TABLE IF EXISTS ods_sku_sale_attr_value;

CREATE EXTERNAL TABLE ods_sku_sale_attr_value(

    `id` STRING COMMENT '编号',

    `sku_id` STRING COMMENT 'sku_id',

    `spu_id` STRING COMMENT 'spu_id',

    `sale_attr_value_id` STRING COMMENT '销售属性值id',

    `sale_attr_id` STRING COMMENT '销售属性id',

    `sale_attr_name` STRING COMMENT '销售属性名称',

    `sale_attr_value_name` STRING COMMENT '销售属性值名称'

) COMMENT 'sku销售属性名称'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';

4.2.26 商品(SPU)表

DROP TABLE IF EXISTS ods_spu_info;

CREATE EXTERNAL TABLE ods_spu_info(

    `id` STRING COMMENT 'spuid',

    `spu_name` STRING COMMENT 'spu名称',

    `category3_id` STRING COMMENT '品类id',

    `tm_id` STRING COMMENT '品牌id'

) COMMENT 'SPU商品表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_spu_info/';

4.2.27 用户表

DROP TABLE IF EXISTS ods_user_info;

CREATE EXTERNAL TABLE ods_user_info(

    `id` STRING COMMENT '用户id',

    `login_name` STRING COMMENT '用户名称',

    `nick_name` STRING COMMENT '用户昵称',

    `name` STRING COMMENT '用户姓名',

    `phone_num` STRING COMMENT '手机号码',

    `email` STRING COMMENT '邮箱',

    `user_level` STRING COMMENT '用户等级',

    `birthday` STRING COMMENT '生日',

    `gender` STRING COMMENT '性别',

    `create_time` STRING COMMENT '创建时间',

    `operate_time` STRING COMMENT '操作时间'

) COMMENT '用户表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS

  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION '/warehouse/gmall/ods/ods_user_info/';

4.2.28 ODS层业务表首日数据装载脚本

1)编写脚本

(1)在/home/atguigu/bin目录下创建脚本hdfs_to_ods_db_init.sh

[atguigu@hadoop102 bin]$ vim hdfs_to_ods_db_init.sh

在脚本中填写如下内容

#!/bin/bash

 

APP=gmall

 

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n "$2" ] ;then

   do_date=$2

else

   echo "请传入日期参数"

   exit

fi

 

ods_order_info="

load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"

 

ods_order_detail="

load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"

 

ods_sku_info="

load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"

 

ods_user_info="

load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"

 

ods_payment_info="

load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"

 

ods_base_category1="

load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"

 

ods_base_category2="

load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"

 

ods_base_category3="

load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "

 

ods_base_trademark="

load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "

 

ods_activity_info="

load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "

 

ods_cart_info="

load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "

 

ods_comment_info="

load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "

 

ods_coupon_info="

load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "

 

ods_coupon_use="

load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "

 

ods_favor_info="

load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "

 

ods_order_refund_info="

load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "

 

ods_order_status_log="

load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "

 

ods_spu_info="

load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "

 

ods_activity_rule="

load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');"

 

ods_base_dic="

load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "

 

ods_order_detail_activity="

load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "

 

ods_order_detail_coupon="

load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "

 

ods_refund_payment="

load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "

 

ods_sku_attr_value="

load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "

 

ods_sku_sale_attr_value="

load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "

 

ods_base_province="

load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"

 

ods_base_region="

load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"

 

case $1 in

    "ods_order_info"){

        hive -e "$ods_order_info"

    };;

    "ods_order_detail"){

        hive -e "$ods_order_detail"

    };;

    "ods_sku_info"){

        hive -e "$ods_sku_info"

    };;

    "ods_user_info"){

        hive -e "$ods_user_info"

    };;

    "ods_payment_info"){

        hive -e "$ods_payment_info"

    };;

    "ods_base_category1"){

        hive -e "$ods_base_category1"

    };;

    "ods_base_category2"){

        hive -e "$ods_base_category2"

    };;

    "ods_base_category3"){

        hive -e "$ods_base_category3"

    };;

    "ods_base_trademark"){

        hive -e "$ods_base_trademark"

    };;

    "ods_activity_info"){

        hive -e "$ods_activity_info"

    };;

    "ods_cart_info"){

        hive -e "$ods_cart_info"

    };;

    "ods_comment_info"){

        hive -e "$ods_comment_info"

    };;

    "ods_coupon_info"){

        hive -e "$ods_coupon_info"

    };;

    "ods_coupon_use"){

        hive -e "$ods_coupon_use"

    };;

    "ods_favor_info"){

        hive -e "$ods_favor_info"

    };;

    "ods_order_refund_info"){

        hive -e "$ods_order_refund_info"

    };;

    "ods_order_status_log"){

        hive -e "$ods_order_status_log"

    };;

    "ods_spu_info"){

        hive -e "$ods_spu_info"

    };;

    "ods_activity_rule"){

        hive -e "$ods_activity_rule"

    };;

    "ods_base_dic"){

        hive -e "$ods_base_dic"

    };;

    "ods_order_detail_activity"){

        hive -e "$ods_order_detail_activity"

    };;

    "ods_order_detail_coupon"){

        hive -e "$ods_order_detail_coupon"

    };;

    "ods_refund_payment"){

        hive -e "$ods_refund_payment"

    };;

    "ods_sku_attr_value"){

        hive -e "$ods_sku_attr_value"

    };;

    "ods_sku_sale_attr_value"){

        hive -e "$ods_sku_sale_attr_value"

    };;

    "ods_base_province"){

        hive -e "$ods_base_province"

    };;

    "ods_base_region"){

        hive -e "$ods_base_region"

    };;

    "all"){

        hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region"

    };;

esac

(2)增加执行权限

[atguigu@hadoop102 bin]$ chmod +x hdfs_to_ods_db_init.sh

2)脚本使用

(1)执行脚本

[atguigu@hadoop102 bin]$ hdfs_to_ods_db_init.sh all 2020-06-14

(2)查看数据是否导入成功

4.2.29 ODS层业务表每日数据装载脚本

1)编写脚本

(1)在/home/atguigu/bin目录下创建脚本hdfs_to_ods_db.sh

[atguigu@hadoop102 bin]$ vim hdfs_to_ods_db.sh

在脚本中填写如下内容

#!/bin/bash

 

APP=gmall

 

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n "$2" ] ;then

    do_date=$2

else

    do_date=`date -d "-1 day" +%F`

fi

 

ods_order_info="

load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"

 

ods_order_detail="

load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"

 

ods_sku_info="

load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"

 

ods_user_info="

load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"

 

ods_payment_info="

load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"

 

ods_base_category1="

load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"

 

ods_base_category2="

load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"

 

ods_base_category3="

load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "

 

ods_base_trademark="

load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "

 

ods_activity_info="

load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "

 

ods_cart_info="

load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "

 

ods_comment_info="

load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "

 

ods_coupon_info="

load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "

 

ods_coupon_use="

load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "

 

ods_favor_info="

load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "

 

ods_order_refund_info="

load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "

 

ods_order_status_log="

load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "

 

ods_spu_info="

load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "

 

ods_activity_rule="

load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');"

 

ods_base_dic="

load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "

 

ods_order_detail_activity="

load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "

 

ods_order_detail_coupon="

load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "

 

ods_refund_payment="

load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "

 

ods_sku_attr_value="

load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "

 

ods_sku_sale_attr_value="

load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "

 

ods_base_province="

load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"

 

ods_base_region="

load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"

 

case $1 in

    "ods_order_info"){

        hive -e "$ods_order_info"

    };;

    "ods_order_detail"){

        hive -e "$ods_order_detail"

    };;

    "ods_sku_info"){

        hive -e "$ods_sku_info"

    };;

    "ods_user_info"){

        hive -e "$ods_user_info"

    };;

    "ods_payment_info"){

        hive -e "$ods_payment_info"

    };;

    "ods_base_category1"){

        hive -e "$ods_base_category1"

    };;

    "ods_base_category2"){

        hive -e "$ods_base_category2"

    };;

    "ods_base_category3"){

        hive -e "$ods_base_category3"

    };;

    "ods_base_trademark"){

        hive -e "$ods_base_trademark"

    };;

    "ods_activity_info"){

        hive -e "$ods_activity_info"

    };;

    "ods_cart_info"){

        hive -e "$ods_cart_info"

    };;

    "ods_comment_info"){

        hive -e "$ods_comment_info"

    };;

    "ods_coupon_info"){

        hive -e "$ods_coupon_info"

    };;

    "ods_coupon_use"){

        hive -e "$ods_coupon_use"

    };;

    "ods_favor_info"){

        hive -e "$ods_favor_info"

    };;

    "ods_order_refund_info"){

        hive -e "$ods_order_refund_info"

    };;

    "ods_order_status_log"){

        hive -e "$ods_order_status_log"

    };;

    "ods_spu_info"){

        hive -e "$ods_spu_info"

    };;

    "ods_activity_rule"){

        hive -e "$ods_activity_rule"

    };;

    "ods_base_dic"){

        hive -e "$ods_base_dic"

    };;

    "ods_order_detail_activity"){

        hive -e "$ods_order_detail_activity"

    };;

    "ods_order_detail_coupon"){

        hive -e "$ods_order_detail_coupon"

    };;

    "ods_refund_payment"){

        hive -e "$ods_refund_payment"

    };;

    "ods_sku_attr_value"){

        hive -e "$ods_sku_attr_value"

    };;

    "ods_sku_sale_attr_value"){

        hive -e "$ods_sku_sale_attr_value"

    };;

    "all"){

        hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value"

    };;

esac

(2)修改权限

[atguigu@hadoop102 bin]$ chmod +x hdfs_to_ods_db.sh

2)脚本使用

(1)执行脚本

[atguigu@hadoop102 bin]$ hdfs_to_ods_db.sh all 2020-06-14

(2)查看数据是否导入成功

 

上一篇:windows Server 2008 IIS7 503错误解决方案


下一篇:数据仓库