看板2

看板2需求分析

需求1:1.1 总意向量

说明:计期内,新增意向客户(包含自己录入的意向客户)总数。

展现:线状图

条件:年、月、线上线下

维度:年、月、线上线下

指标:总意向客户量

粒度:天,可以下钻到小时数据。

数据来源:客户管理系统的customer_relationship意向表

指标:意向数量

维度:

  • 新老意向用户维度
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer_relationshop (客户意向)
    • create_date_time:时间相关
    • customer_id:学生ID

需求2:意向学员位置热力图

说明:统计指定时间段内,新增的意向客户,所在城市区域人数热力图。

展现:地图热力图

维度:年、月、线上线下

指标:按照地区聚合意向客户id数量

粒度:天,可以下钻到小时数据。

条件:年、月、线上线下

数据来源:客户管理系统的customer(客户静态信息表) 、customer_relationship(客户意向表)

指标:意向用户

维度:

- 新老意向用户维度
- 区域维度
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer(客户表(学生表))
    • customer.area 区域字段
    • customer.id
  • customer_relationship (意向表)
    • customer_relationship.customer_id (计算意向总数用的)
    • customer_relationship.create_date_time

需求3:意向学科排名

说明:统计指定时间段内,新增的意向客户中,意向学科人数排行榜。学科名称要关联查询出来。

展现:柱状图

条件:年、月、线上线下

指标:学科意向客户量

粒度:天,可以下钻到小时数据。

数据来源:客户管理系统的customer_clue(客户线索表)、customer_relationship(客户意向表)、itcast_subject(学科表)

指标:意向数量

维度:

- 新老意向用户维度
- 学科维度
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer_clue (线索表)
    • clue_state (线索状态)-- 判断新老用户
    • deleted (是否删除)
    • create_date_time
    • customer_relationship_id (意向表id)
  • 表:customer_relationship (意向表)
    • id (意向表id)
    • customer_id (计算意向数量)
    • itcast_subject_id(学科id)
    • origin_type (来源类型)(NETSERVICE和PRESIGNUP这两个类型表示线上)(不是这俩都当线下看待)
  • 表:itcast_subject (学科表)
    • name 学科名称
    • id 学科ID

需求4:意向校区排名

说明:统计指定时间段内,新增的意向客户中,意向校区人数排行榜。

展现:柱状图

条件:年、月、线上线下

指标:校区意向客户量

粒度:天,可以下钻到小时数据。

数据来源:客户管理系统的

注意:学校id,同步时,0和null转换为统一数据,都转换为-1

指标:意向数量

维度:

- 新老用户
- 校区维度
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer_clue (线索表)
    • clue_state (线索状态)-- 判断新老用户
    • deleted (是否删除)
    • create_date_time
    • customer_relationship_id (意向表id)
  • 表:customer_relationship (意向表)
    • id
    • customer_id (计算意向数量)
    • itcast_school_id (校区id)
    • origin_type (来源类型)(NETSERVICE和PRESIGNUP这两个类型表示线上)(不是这俩都当线下看待)
  • 表:itcast_school (校区表)
    • name 校区名称
    • id

需求5:来源渠道占比

说明:统计指定时间段内,新增的意向客户中,不同来源渠道的意向客户占比。

展现:饼状图

条件:年、月、线上线下

粒度:天,可以下钻到小时数据。

指标:来源渠道意向客户量

数据来源:客户管理系统的customer_clue(客户线索表)、customer_relationship(客户意向表)

指标:意向数量

维度:

  • 新老用户
  • 来源渠道
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer_relationship(意向表)
    • origin_type (来源类型)(NETSERVICE和PRESIGNUP这两个类型表示线上)(不是这俩都当线下看待)
    • customer_id (计算意向数量)
    • create_date_time
  • 表:customer_clue (线索表)
    • clue_state (线索状态)-- 判断新老用户 (VALID_NEW_CLUES 表示 新线索)
    • deleted (是否删除)

需求6:意向咨询中心贡献占比

说明:统计指定时间段内,新增的意向客户中,各咨询中心产生的意向客户数占比情况。

展现:饼状图

条件:年、月、线上线下

指标:咨询中心意向客户数

粒度:天,可以下钻到小时数据。

数据来源:客户管理系统的customer_relationship(客户意向表)、employee(员工表)、scrm_department(部门表)

指标:意向数量

维度:

  • 新老用户
  • 咨询中心维度
  • 时间维度
    • 年、月、日、小时
  • 线上线下

涉及到的信息

  • 表:customer_relationship(意向表)
    • customer_id (计算意向数量)
    • creator(创建者)
    • create_date_time
    • origin_type (来源类型)(NETSERVICE和PRESIGNUP这两个类型表示线上)(不是这俩都当线下看待)
  • 表:employee(员工表)
    • tdepart_id(部门id)
    • id(员工)
  • 表:scrm_department(部门表)
    • name (部门名称)
    • id(部门id)
  • 表:customer_clue(线索表)
    • clue_state (线索状态)-- 判断新老用户 (VALID_NEW_CLUES 表示 新线索)

总结需求

指标

意向数量

可以推测出DWS就一张表

维度:

  • 新老用户
  • 时间维度
    • 年、月、日、小时
  • 线上线下
  • 区域
  • 学科
  • 校区
  • 来源渠道
  • 咨询中心

新老用户、时间维度、线上线下,这三个维度是基础维度

会作为整体和其它维度进行关联

涉及的信息

  • 表:customer_relationshop (意向表)

    • create_date_time:时间相关
    • customer_id (计算意向总数用的)
    • create_date_time
    • id (意向表id)
    • itcast_subject_id(学科id)
    • itcast_school_id (校区id)
    • origin_type (来源类型)(NETSERVICE和PRESIGNUP这两个类型表示线上)(不是这俩都当线下看待)
    • creator(创建者)
  • 表:customer(客户表(学生表))

    • area 区域字段
    • id 学生ID
  • 表:customer_clue (线索表)

    • clue_state (线索状态)-- 判断新老用户
    • deleted (是否删除)
    • create_date_time
    • customer_relationship_id (意向表id)
  • 表:itcast_subject (学科表)

    • name 学科名称
    • id 学科ID
  • 表:itcast_school (校区表)

    • name 校区名称
    • id
  • 表:employee(员工表)

    • tdepart_id(部门id)
    • id(员工)
  • 表:scrm_department(部门表)

    • name (部门名称)
    • id(部门id)
业务数据库的准备

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cG3JocAq-1619100637916)(https://image-set.oss-cn-zhangjiakou.aliyuncs.com/img-out/image-20201229091618558.png)]

准备业务数据库和数据

在真实的企业中,这一步不存在。

创建数据库

create database scrm default character set utf8mb4 collate utf8mb4_unicode_ci;

创建表

课程资料中提供了一个叫做:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SOFqTxc0-1619100637918)(https://image-set.oss-cn-zhangjiakou.aliyuncs.com/img-out/image-20201229092242154.png)]

如图,7个SQL文件,导入到刚刚创建的scrm库中即可。

建模

业务->ODS->DWD->DWM->DWS->ADS

DIM层

DIM层就是维度层。主要存放各种维度表

业务到ODS

步骤一

将两张事实表存入ODS

  • 意向表
  • 线索表

表的存储细节

  • 数据存储类型:ORC

  • 压缩类型:Zlib(或者Snappy)

  • 是否分区:是,基于采集时间分区(start_time)

  • 是否分桶:是,因为当前这个需求中后面要做很多的JOIN,分桶有JOIN的优化

步骤二

将5张维度表存入DIM层

  • 校区表
  • 学生表(客户表)
  • 员工表
  • 部门表
  • 学科表

分区不需要

分桶可以有

表的存储细节

  • 数据存储类型:ORC
  • 压缩类型:Snappy

如有有人问

ODS什么压缩好,DIM什么压缩好。

  • ODS偏向Zlib压缩,因为数据偏冷数据,尽量保持压缩率是第一要素
  • DIM偏向Snappy等压缩,因为数据偏热数据,DIM层可能被很多地方需要,经常被查询,Snappy合适的压缩率以及优秀的CPU计算性能确保热数据既能合理的控制空间占用以及也能提供优秀的压缩、解压缩性能。

ODS -> DWD

DWD:清洗、转换、抽取所需字段的层级

DWD需要将ODS中的事实表,所需字段抽取到DWD中,同时完成一些清洗和转换的操作

普通字段:

来自:custom_relationship表

  • customer_id (计算意向用户指标)

    也能关联customer(学生表),找到学生所在的区域

  • create_date_time,转换成标准的时间戳

  • hourinfo:create_date_time转换成小时维度

  • rid (意向表的id)

  • itcast_subject_id(学科id)

    关联学科表取到学科名称

  • itcast_school_id(学校id)

    关联学校表找到校区名称

  • origin_type(来源类型)

    可以判断是否线上和线下

    同时它自身也是一个维度用来判断来源的类型

  • origin_type_stat(线上线下)

    数据来自于origin_type的内容,基于它的内容来判断是否线上和线下

    0标记线下,1标记线上

  • creator(客户线索创建者)

    关联员工表找到员工的部门id,通过员工的部门id又能找到部门名称

来自customer_clue表:

  • clue_state(线索状态)

    判断是否新老用户

  • deleted(是否删除)

    清洗数据,删除的数据DWD不要

  • customer_relationship_id 意向表ID

    用来和意向表确定关联关系

字段:

  • yearinfo:由create_date_time:转换成年维度
  • monthinfo:由create_date_time:转换成月维度
  • dayinfo:由create_date_time:转换成日维度

表数据量:1个

表存储格式:ORC

表压缩:Snappy

分区:三个分区(年、月、日)

内部表:是

DWD -> DWM 过程

DWM:中间预聚合的结果、维度退化的结果

  • 预聚合:不做预聚合
  • 维度退化:要做,要将DWD中记录的所有维度的key,全部退化成具体的维度信息,生成一个宽表

在DWD到DWM的过程中

DWD会和DIM层中的维度表进行关联,将DWD中的所有维度退化成具体的维度信息,存入DWM层中,做为一个宽表存在。

普通字段:

来自:custom_relationship表

  • customer_id (计算意向用户指标)

  • area(由customer_id关联customer表得到)

  • create_date_time,转换成标准的时间戳

  • hourinfo:create_date_time转换成小时维度

  • rid (意向表的id)

  • itcast_subject_id(学科id)

  • itcast_subject_name (学科名称)

    由学科id关联而来

  • itcast_school_id(学校id)

    关联学校表找到校区名称

  • itcast_school_name(学校名称)

    由学校id关联而来

  • origin_type(来源类型)

    可以判断是否线上和线下

    同时它自身也是一个维度用来判断来源的类型

  • origin_type_stat(线上线下)

    数据来自于origin_type的内容,基于它的内容来判断是否线上和线下

    0标记线下,1标记线上

  • tdepart_id 部门id(由creator关联员工找到员工所在的部门id)

  • tdepart_name 部门名称,(由creator关联员工找到员工所在的部门id,在通过部门id找到部门表中的部门名称)

来自customer_clue表:

  • clue_state(线索状态)

    判断是否新老用户

  • deleted(是否删除)

    清洗数据,删除的数据DWD不要

分区字段:

  • yearinfo:由create_date_time:转换成年维度
  • monthinfo:由create_date_time:转换成月维度
  • dayinfo:由create_date_time:转换成日维度

分桶字段:

  • customer_id
  • sorted字段:customer_id

表存储:ORC

压缩:Snappy

分隔符:\t

这一层的重点:

DWD和DIM关联,将所有的维度退化,得到DWM的宽表

也就是一堆JOIN语句在这一层中写

DWM -> DWS 层

DWS目的是:将DWM中的宽表,聚合到DWS中的表,这个表中所有维度的统计都会集中存储在一个表中

比如:

纯时间维度的意向统计

按年意向量

按年、月意向量

按年、月、日意向量

按年、月、日、小时意向量

时间维度和线上线下维度统计

按年意向量+线上

按年、月意向量+线上

按年、月、日意向量+线上

按年、月、日、小时意向量+线上

按年意向量+线下

按年、月意向量+线下

按年、月、日意向量+线下

按年、月、日、小时意向量+线下

时间维度和新老客户

按年意向量+新客户

按年、月意向量+新客户

按年、月、日意向量+新客户

按年、月、日、小时意向量+新客户

按年意向量+老客户

按年、月意向量+老客户

按年、月、日意向量+老客户

按年、月、日、小时意向量+老客户

时间维度和线上线下和新老客户

按年意向量+新客户+线上

按年、月意向量+新客户+线上

按年、月、日意向量+新客户+线上

按年、月、日、小时意向量+新客户+线上

按年意向量+老客户+线上

按年、月意向量+老客户+线上

按年、月、日意向量+老客户+线上

按年、月、日、小时意向量+老客户+线上

按年意向量+新客户+线下

按年、月意向量+新客户+线下

按年、月、日意向量+新客户+线下

按年、月、日、小时意向量+新客户+线下

按年意向量+老客户+线下

按年、月意向量+老客户+线下

按年、月、日意向量+老客户+线下

按年、月、日、小时意向量+老客户+线下

时间维度和线上线下和新老客户 + 来源渠道

按年意向量+新客户+线上 + 来源渠道

按年、月意向量+新客户+线上 + 来源渠道

按年、月、日意向量+新客户+线上 + 来源渠道

按年、月、日、小时意向量+新客户+线上 + 来源渠道

按年意向量+老客户+线上 + 来源渠道

按年、月意向量+老客户+线上 + 来源渠道

按年、月、日意向量+老客户+线上 + 来源渠道

按年、月、日、小时意向量+老客户+线上 + 来源渠道

按年意向量+新客户+线下 + 来源渠道

按年、月意向量+新客户+线下 + 来源渠道

按年、月、日意向量+新客户+线下 + 来源渠道

按年、月、日、小时意向量+新客户+线下 + 来源渠道

按年意向量+老客户+线下 + 来源渠道

按年、月意向量+老客户+线下 + 来源渠道

按年、月、日意向量+老客户+线下 + 来源渠道

按年、月、日、小时意向量+老客户+线下 + 来源渠道

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UU535HdQ-1619100637920)(https://image-set.oss-cn-zhangjiakou.aliyuncs.com/img-out/image-20201229105806862.png)]

drop Table itcast_dws.itcast_intention_dws;
CREATE TABLE IF NOT EXISTS itcast_dws.itcast_intention_dws (
    -- 普通列
   `customer_total` INT COMMENT '聚合意向客户数',
   `area` STRING COMMENT '区域信息',
   `itcast_school_id` STRING COMMENT '校区id',
   `itcast_school_name` STRING COMMENT '校区名称',
   `origin_type` STRING COMMENT '来源渠道',
   `itcast_subject_id` STRING COMMENT '学科id',
   `itcast_subject_name` STRING COMMENT '学科名称',
   `hourinfo` STRING COMMENT '小时信息',
   `origin_type_stat` STRING COMMENT '数据来源:0.线下;1.线上',
   `clue_state_stat` STRING COMMENT '客户属性:0.老客户;1.新客户',
   `tdepart_id` STRING COMMENT '创建者部门id',
   `tdepart_name` STRING COMMENT '咨询中心名称',
   `time_str` STRING COMMENT '时间明细',
   `groupType` STRING COMMENT '产品属性类别:1.总意向量;2.区域信息;3.校区、学科组合分组;4.来源渠道;5.贡献中心;',
   `time_type` STRING COMMENT '时间维度:1、按小时聚合;2、按天聚合;3、按周聚合;4、按月聚合;5、按年聚合;'
)
comment '客户意向dws表'
-- 分区列
PARTITIONED BY(yearinfo STRING,monthinfo STRING,dayinfo STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

ADS层

操作

准备业务数据

从业务数据库中采集数据到ODS和DIM

将两个事实表采集到ODS

  • customer_relationship
  • customer_clue

customer_relationship 在 ODS的建表语句

# 开启自动执行MapJoin(达到条件,会自动走Map的Join而不是Reduce的Join)
set hive.auto.convert.join=true;
# 开启bucket mapjoin的优化
set hive.optimize.bucketmapjoin=true;
# set hive.auto.convert.join.noconditionaltask.size=512000000
# 开启排序
set hive.enforce.sorting=true;
# 开启SMB优化的自动尝试
set hive.optimize.bucketmapjoin.sortedmerge=true;
DROP TABLE itcast_ods.`customer_relationship`;
CREATE TABLE IF NOT EXISTS itcast_ods.`customer_relationship` (
  `id` int COMMENT '客户关系id',
  `create_date_time` STRING COMMENT '创建时间',
  `update_date_time` STRING COMMENT '最后更新时间',
  `deleted` int COMMENT '是否被删除(禁用)',
  `customer_id` int COMMENT '所属客户id',
  `first_id` int COMMENT '第一条客户关系id',
  `belonger` int COMMENT '归属人',
  `belonger_name` STRING COMMENT '归属人姓名',
  `initial_belonger` int COMMENT '初始归属人',
  `distribution_handler` int COMMENT '分配处理人',
  `business_scrm_department_id` int COMMENT '归属部门',
  `last_visit_time` STRING COMMENT '最后回访时间',
  `next_visit_time` STRING COMMENT '下次回访时间',
  `origin_type` STRING COMMENT '数据来源',
  `itcast_school_id` int COMMENT '校区Id',
  `itcast_subject_id` int COMMENT '学科Id',
  `intention_study_type` STRING COMMENT '意向学习方式',
  `anticipat_signup_date` STRING COMMENT '预计报名时间',
  `level` STRING COMMENT '客户级别',
  `creator` int COMMENT '创建人',
  `current_creator` int COMMENT '当前创建人:初始==创建人,当在公海拉回时为 拉回人',
  `creator_name` STRING COMMENT '创建者姓名',
  `origin_channel` STRING COMMENT '来源渠道',
  `comment` STRING COMMENT '备注',
  `first_customer_clue_id` int COMMENT '第一条线索id',
  `last_customer_clue_id` int COMMENT '最后一条线索id',
  `process_state` STRING COMMENT '处理状态',
  `process_time` STRING COMMENT '处理状态变动时间',
  `payment_state` STRING COMMENT '支付状态',
  `payment_time` STRING COMMENT '支付状态变动时间',
  `signup_state` STRING COMMENT '报名状态',
  `signup_time` STRING COMMENT '报名时间',
  `notice_state` STRING COMMENT '通知状态',
  `notice_time` STRING COMMENT '通知状态变动时间',
  `lock_state` STRING COMMENT '锁定状态',
  `lock_time` STRING COMMENT '锁定状态修改时间',
  `itcast_clazz_id` int COMMENT '所属ems班级id',
  `itcast_clazz_time` STRING COMMENT '报班时间',
  `payment_url` STRING COMMENT '付款链接',
  `payment_url_time` STRING COMMENT '支付链接生成时间',
  `ems_student_id` int COMMENT 'ems的学生id',
  `delete_reason` STRING COMMENT '删除原因',
  `deleter` int COMMENT '删除人',
  `deleter_name` STRING COMMENT '删除人姓名',
  `delete_time` STRING COMMENT '删除时间',
  `course_id` int COMMENT '课程ID',
  `course_name` STRING COMMENT '课程名称',
  `delete_comment` STRING COMMENT '删除原因说明',
  `close_state` STRING COMMENT '关闭装填',
  `close_time` STRING COMMENT '关闭状态变动时间',
  `appeal_id` int COMMENT '申诉id',
  `tenant` int COMMENT '租户',
  `total_fee` DECIMAL COMMENT '报名费总金额',
  `belonged` int COMMENT '小周期归属人',
  `belonged_time` STRING COMMENT '归属时间',
  `belonger_time` STRING COMMENT '归属时间',
  `transfer` int COMMENT '转移人',
  `transfer_time` STRING COMMENT '转移时间',
  `follow_type` int COMMENT '分配类型,0-自动分配,1-手动分配,2-自动转移,3-手动单个转移,4-手动批量转移,5-公海领取',
  `transfer_bxg_oa_account` STRING COMMENT '转移到博学谷归属人OA账号',
  `transfer_bxg_belonger_name` STRING COMMENT '转移到博学谷归属人OA姓名',
  `end_time` STRING COMMENT '有效截止时间')
comment '客户关系表'
PARTITIONED BY(start_time STRING)
clustered by(id) sorted by(id) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='ZLIB');

customer_clue 在ODS的建表语句

# 开启自动执行MapJoin(达到条件,会自动走Map的Join而不是Reduce的Join)
set hive.auto.convert.join=true;
# 开启bucket mapjoin的优化
set hive.optimize.bucketmapjoin=true;
# set hive.auto.convert.join.noconditionaltask.size=512000000
# 开启排序
set hive.enforce.sorting=true;
# 开启SMB优化的自动尝试
set hive.optimize.bucketmapjoin.sortedmerge=true;
DROP TABLE itcast_ods.customer_clue;
CREATE TABLE IF NOT EXISTS itcast_ods.customer_clue (
  id int COMMENT 'customer_clue_id',
  create_date_time STRING COMMENT '创建时间',
  update_date_time STRING COMMENT '最后更新时间',
  deleted STRING COMMENT '是否被删除(禁用)',
  customer_id int COMMENT '客户id',
  customer_relationship_id int COMMENT '客户关系id',
  session_id STRING COMMENT '七陌会话id',
  sid STRING COMMENT '访客id',
  status STRING COMMENT '状态(undeal待领取 deal 已领取 finish 已关闭 changePeer 已流转)',
  users STRING COMMENT '所属坐席',
  create_time STRING COMMENT '七陌创建时间',
  platform STRING COMMENT '平台来源 (pc-网站咨询|wap-wap咨询|sdk-app咨询|weixin-微信咨询)',
  s_name STRING COMMENT '用户名称',
  seo_source STRING COMMENT '搜索来源',
  seo_keywords STRING COMMENT '关键字',
  ip STRING COMMENT 'IP地址',
  referrer STRING COMMENT '上级来源页面',
  from_url STRING COMMENT '会话来源页面',
  landing_page_url STRING COMMENT '访客着陆页面',
  url_title STRING COMMENT '咨询页面title',
  to_peer STRING COMMENT '所属技能组',
  manual_time STRING COMMENT '人工开始时间',
  begin_time STRING COMMENT '坐席领取时间 ',
  reply_msg_count int COMMENT '客服回复消息数',
  total_msg_count int COMMENT '消息总数',
  msg_count int COMMENT '客户发送消息数',
  comment STRING COMMENT '备注',
  finish_reason STRING COMMENT '结束类型',
  finish_user STRING COMMENT '结束坐席',
  end_time STRING COMMENT '会话结束时间',
  platform_description STRING COMMENT '客户平台信息',
  browser_name STRING COMMENT '浏览器名称',
  os_info STRING COMMENT '系统名称',
  area STRING COMMENT '区域',
  country STRING COMMENT '所在国家',
  province STRING COMMENT '省',
  city STRING COMMENT '城市',
  creator int COMMENT '创建人',
  name STRING COMMENT '客户姓名',
  idcard STRING COMMENT '身份证号',
  phone STRING COMMENT '手机号',
  itcast_school_id int COMMENT '校区Id',
  itcast_school STRING COMMENT '校区',
  itcast_subject_id int COMMENT '学科Id',
  itcast_subject STRING COMMENT '学科',
  wechat STRING COMMENT '微信',
  qq STRING COMMENT 'qq号',
  email STRING COMMENT '邮箱',
  gender STRING COMMENT '性别',
  level STRING COMMENT '客户级别',
  origin_type STRING COMMENT '数据来源渠道',
  information_way STRING COMMENT '资讯方式',
  working_years STRING COMMENT '开始工作时间',
  technical_directions STRING COMMENT '技术方向',
  customer_state STRING COMMENT '当前客户状态',
  valid STRING COMMENT '该线索是否是网资有效线索',
  anticipat_signup_date STRING COMMENT '预计报名时间',
  clue_state STRING COMMENT '线索状态',
  scrm_department_id int COMMENT 'SCRM内部部门id',
  superior_url STRING COMMENT '诸葛获取上级页面URL',
  superior_source STRING COMMENT '诸葛获取上级页面URL标题',
  landing_url STRING COMMENT '诸葛获取着陆页面URL',
  landing_source STRING COMMENT '诸葛获取着陆页面URL来源',
  info_url STRING COMMENT '诸葛获取留咨页URL',
  info_source STRING COMMENT '诸葛获取留咨页URL标题',
  origin_channel STRING COMMENT '投放渠道',
  course_id int COMMENT '课程编号',
  course_name STRING COMMENT '课程名称',
  zhuge_session_id STRING COMMENT 'zhuge会话id',
  is_repeat int COMMENT '是否重复线索(手机号维度) 0:正常 1:重复',
  tenant int COMMENT '租户id',
  activity_id STRING COMMENT '活动id',
  activity_name STRING COMMENT '活动名称',
  follow_type int COMMENT '分配类型,0-自动分配,1-手动分配,2-自动转移,3-手动单个转移,4-手动批量转移,5-公海领取',
  shunt_mode_id int COMMENT '匹配到的技能组id',
  shunt_employee_group_id int COMMENT '所属分流员工组',
  ends_time STRING COMMENT '有效时间')
comment '客户关系表'
PARTITIONED BY(starts_time STRING)
clustered by(customer_relationship_id) sorted by(customer_relationship_id) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='ZLIB');

在DIM层中创建5张维度表

先创建DIM层数据库

CREATE DATABASE itcast_dimen;

客户维度表

CREATE TABLE IF NOT EXISTS itcast_dimen.`customer` (
  `id` int COMMENT 'key id',
  `customer_relationship_id` int COMMENT '当前意向id',
  `create_date_time` STRING COMMENT '创建时间',
  `update_date_time` STRING COMMENT '最后更新时间',
  `deleted` int  COMMENT '是否被删除(禁用)',
  `name` STRING COMMENT '姓名',
  `idcard` STRING  COMMENT '身份证号',
  `birth_year` int COMMENT '出生年份',
  `gender` STRING COMMENT '性别',
  `phone` STRING COMMENT '手机号',
  `wechat` STRING COMMENT '微信',
  `qq` STRING COMMENT 'qq号',
  `email` STRING COMMENT '邮箱',
  `area` STRING COMMENT '所在区域',
  `leave_school_date` date COMMENT '离校时间',
  `graduation_date` date COMMENT '毕业时间',
  `bxg_student_id` STRING COMMENT '博学谷学员ID,可能未关联到,不存在',
  `creator` int COMMENT '创建人ID',
  `origin_type` STRING COMMENT '数据来源',
  `origin_channel` STRING COMMENT '来源渠道',
  `tenant` int,
  `md_id` int COMMENT '中台id')
comment '客户表'
PARTITIONED BY(start_time STRING)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

员工维度表

CREATE TABLE IF NOT EXISTS itcast_dimen.employee (
  id int COMMENT '员工id',
  email STRING COMMENT '公司邮箱,OA登录账号',
  real_name STRING COMMENT '员工的真实姓名',
  phone STRING COMMENT '手机号,目前还没有使用;隐私问题OA接口没有提供这个属性,',
  department_id STRING COMMENT 'OA中的部门编号,有负值',
  department_name STRING COMMENT 'OA中的部门名',
  remote_login STRING COMMENT '员工是否可以远程登录',
  job_number STRING COMMENT '员工工号',
  cross_school STRING COMMENT '是否有跨校区权限',
  last_login_date STRING COMMENT '最后登录日期',
  creator int COMMENT '创建人',
  create_date_time STRING COMMENT '创建时间',
  update_date_time STRING COMMENT '最后更新时间',
  deleted STRING COMMENT '是否被删除(禁用)',
  scrm_department_id int COMMENT 'SCRM内部部门id',
  leave_office STRING COMMENT '离职状态',
  leave_office_time STRING COMMENT '离职时间',
  reinstated_time STRING COMMENT '复职时间',
  superior_leaders_id int COMMENT '上级领导ID',
  tdepart_id int COMMENT '直属部门',
  tenant int COMMENT '租户',
  ems_user_name STRING COMMENT 'ems用户名称'
)
comment '员工表'
PARTITIONED BY(start_time STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

部门维度表

CREATE TABLE IF NOT EXISTS itcast_dimen.`scrm_department` (
  `id` int COMMENT '部门id',
  `name` STRING COMMENT '部门名称',
  `parent_id` int COMMENT '父部门id',
  `create_date_time` STRING COMMENT '创建时间',
  `update_date_time` STRING COMMENT '更新时间',
  `deleted` STRING COMMENT '删除标志',
  `id_path` STRING COMMENT '编码全路径',
  `tdepart_code` int COMMENT '直属部门',
  `creator` STRING COMMENT '创建者',
  `depart_level` int COMMENT '部门层级',
  `depart_sign` int COMMENT '部门标志,暂时默认1',
  `depart_line` int COMMENT '业务线,存储业务线编码',
  `depart_sort` int COMMENT '排序字段',
  `disable_flag` int COMMENT '禁用标志',
  `tenant` int COMMENT '租户')
comment 'scrm部门表'
PARTITIONED BY(start_time STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

学校维度表

CREATE TABLE IF NOT EXISTS itcast_dimen.`itcast_school` (
  `id` int COMMENT '自增主键',
  `create_date_time` timestamp COMMENT '创建时间',
  `update_date_time` timestamp  COMMENT '最后更新时间',
  `deleted` STRING COMMENT '是否被删除(禁用)',
  `name` STRING COMMENT '校区名称',
  `code` STRING COMMENT '校区标识',
  `tenant` int COMMENT '租户')
comment '校区字典表'
PARTITIONED BY(start_time STRING)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

学科维度表

CREATE TABLE IF NOT EXISTS itcast_dimen.`itcast_subject` (
  `id` int COMMENT '自增主键',
  `create_date_time` timestamp COMMENT '创建时间',
  `update_date_time` timestamp COMMENT '最后更新时间',
  `deleted` STRING COMMENT '是否被删除(禁用)',
  `name` STRING COMMENT '学科名称',
  `code` STRING COMMENT '学科编码',
  `tenant` int COMMENT '租户')
comment '学科字典表'
PARTITIONED BY(start_time STRING)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

将业务数据采集到ODS

ODS的两个事实表,由于都有分桶,不能直接用sqoop写入,所以需要先创建临时表。

语句:略(将ODS建表语句中分桶删除,以及表名改一下即可)

将数据导入到临时表中

两张临时表:

  • customer_relationship_tmp
  • customer_clue_tmp

意向表导入

SQOOP插入意向表的临时表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select id, create_date_time, update_date_time, deleted, customer_id, first_id, belonger, belonger_name, initial_belonger, distribution_handler, business_scrm_department_id, last_visit_time, next_visit_time, origin_type, itcast_school_id, itcast_subject_id, intention_study_type, anticipat_signup_date, level, creator, current_creator, creator_name, origin_channel, comment, first_customer_clue_id, last_customer_clue_id, process_state, process_time, payment_state, payment_time, signup_state, signup_time, notice_state, notice_time, lock_state, lock_time, itcast_clazz_id, itcast_clazz_time, payment_url, payment_url_time, ems_student_id, delete_reason, deleter, deleter_name, delete_time, course_id, course_name, delete_comment, close_state, close_time, appeal_id, tenant, total_fee, belonged, belonged_time, belonger_time, transfer, transfer_time, follow_type, transfer_bxg_oa_account, transfer_bxg_belonger_name, FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time,date_format("9999-12-31","%Y-%m-%d") as end_time from customer_relationship where $CONDITIONS' \
--hcatalog-database itcast_ods \
--hcatalog-table customer_relationship_tmp \
-m 10 \
--split-by id
将临时表数据写入正式的意向表中
--分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
--hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
--写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;

set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;


INSERT INTO customer_relationship  partition(start_time) SELECT * FROM customer_relationship_tmp;

线索表导入

用sqoop导入数据到线索表的临时表中

sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select id,create_date_time,update_date_time,deleted,customer_id,customer_relationship_id,session_id,sid,status,user as users,create_time,platform,s_name,seo_source,seo_keywords,ip,referrer,from_url,landing_page_url,url_title,to_peer,manual_time,begin_time,reply_msg_count,total_msg_count,msg_count,comment,finish_reason,finish_user,end_time,platform_description,browser_name,os_info,area,country,province,city,creator,name,"-1" as idcard,"-1" as phone,itcast_school_id,itcast_school,itcast_subject_id,itcast_subject,"-1" as wechat,"-1" as qq,"-1" as email,gender,level,origin_type,information_way,working_years,technical_directions,customer_state,valid,anticipat_signup_date,clue_state,scrm_department_id,superior_url,superior_source,landing_url,landing_source,info_url,info_source,origin_channel,course_id,course_name,zhuge_session_id,is_repeat,tenant,activity_id,activity_name,follow_type,shunt_mode_id,shunt_employee_group_id,FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as starts_time,date_format("9999-12-31","%Y-%m-%d") as ends_time from customer_clue where $CONDITIONS' \
--hcatalog-database itcast_ods \
--hcatalog-table customer_clue_tmp \
-m 10 \
--split-by id

将线索表临时表中的数据导入到正式表中

--分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
--hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
--写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;

set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;


INSERT INTO customer_clue partition(starts_time) SELECT * FROM customer_clue_tmp;

将业务数据采集到DIMEN层的5个维度表中

# 采集customer表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select id, customer_relationship_id, create_date_time, update_date_time, deleted, name, idcard, birth_year, gender, phone,  wechat, qq, email, area, leave_school_date, graduation_date, bxg_student_id, creator, origin_type, origin_channel, tenant, md_id, FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time from customer where $CONDITIONS' \
--hcatalog-database itcast_dimen \
--hcatalog-table customer \
-m 10 \
--split-by id

# 采集员工表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select id,email,real_name,-1 as phone,department_id,department_name,remote_login,job_number,cross_school,last_login_date,creator,create_date_time,update_date_time,deleted,scrm_department_id,leave_office,leave_office_time,reinstated_time,superior_leaders_id,tdepart_id,tenant,ems_user_name,FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time from employee where $CONDITIONS' \
--hcatalog-database itcast_dimen \
--hcatalog-table employee \
-m 10 \
--split-by id

# 采集部门表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select *, FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time from scrm_department where $CONDITIONS' \
--hcatalog-database itcast_dimen \
--hcatalog-table scrm_department \
-m 10 \
--split-by id


# 采集学校表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select *, FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time from itcast_school where $CONDITIONS' \
--hcatalog-database itcast_dimen \
--hcatalog-table itcast_school \
-m 10 \
--split-by id

# 采集学科表
sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/scrm \
--username root \
--password 123456 \
--query 'select *, FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as start_time from itcast_subject where $CONDITIONS' \
--hcatalog-database itcast_dimen \
--hcatalog-table itcast_subject \
-m 10 \
--split-by id
ODS -> DWD

DWD对ODS的事实表进行清洗、转换、字段抽取

DWD建表

CREATE DATABASE IF NOT EXISTS itcast_dwd;
drop table itcast_dwd.`itcast_intention_dwd`;
CREATE TABLE IF NOT EXISTS itcast_dwd.`itcast_intention_dwd` (
  `rid` int COMMENT 'id',
  `customer_id` STRING COMMENT '客户id',
  `create_date_time` STRING COMMENT '创建时间',
  `itcast_school_id` STRING COMMENT '校区id',
  `deleted` STRING COMMENT '是否被删除',
  `origin_type` STRING COMMENT '来源渠道',
  `itcast_subject_id` STRING COMMENT '学科id',
  `creator` int COMMENT '创建人',
  `hourinfo` STRING COMMENT '小时信息',
  `origin_type_stat` STRING COMMENT '数据来源:0.线下;1.线上'
)
comment '客户意向dwd表'
PARTITIONED BY(yearinfo STRING,monthinfo STRING,dayinfo STRING)
-- 以customer_relationship表的id作为分桶的key,同时排序。在join的时候可以提升性能,因为排序,在可以的情况下可以用SMB Join
clustered by(rid) sorted by(rid) into 10 buckets
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
stored as ORC
TBLPROPERTIES ('orc.compress'='SNAPPY');

ODS -> DWD 执行

要求:

  • 清洗掉已删除的数据
  • 学校和学科ID NULL统一转换成-1
--分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
--hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
--写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;
--分桶
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;



insert into table itcast_dwd.itcast_intention_dwd partition (yearinfo,monthinfo,dayinfo)
SELECT
	id AS rid,
	customer_id,
	create_date_time,
	IF (itcast_school_id IS NULL, -1, itcast_school_id) AS itcast_school_id,
	deleted,
	origin_type,
	IF (itcast_subject_id IS NULL, -1, itcast_subject_id) AS itcast_subject_id,
	creator,
	SUBSTRING(create_date_time, 12, 2) AS hourinfo,
	IF(origin_type = 'NETSERVICE', 1, IF(origin_type = 'PRESIGNUP', 1, 0)) AS origin_type_stat,
	YEAR(create_date_time) AS yearinfo,
	MONTH(create_date_time) AS monthinfo,
	DAY(create_date_time) AS dayinfo
FROM itcast_ods.customer_relationship
WHERE deleted = 0;
DWD-> DWM

DWM:维度退化(宽表)

DWM建表

create database itcast_dwm;
drop table itcast_dwm.`itcast_intention_dwm`;
CREATE TABLE IF NOT EXISTS itcast_dwm.`itcast_intention_dwm` (
  `customer_id` STRING COMMENT 'id信息',
  `create_date_time` STRING COMMENT '创建时间',
  `area` STRING COMMENT '区域信息',
  `itcast_school_id` STRING COMMENT '校区id',
  `itcast_school_name` STRING COMMENT '校区名称',
  `deleted` STRING COMMENT '是否被删除',
  `origin_type` STRING COMMENT '来源渠道',
  `itcast_subject_id` STRING COMMENT '学科id',
  `itcast_subject_name` STRING COMMENT '学科名称',
  `hourinfo` STRING COMMENT '小时信息',
  `origin_type_stat` STRING COMMENT '数据来源:0.线下;1.线上',
  `clue_state_stat` STRING COMMENT '新老客户:0.老客户;1.新客户',
  `tdepart_id` STRING COMMENT '创建者部门id',
  `tdepart_name` STRING COMMENT '咨询中心名称'
)
comment '客户意向dwm表'
PARTITIONED BY(yearinfo STRING,monthinfo STRING,dayinfo STRING)
clustered by(customer_id) sorted by(customer_id) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as ORC
TBLPROPERTIES ('orc.compress'='SNAPPY');

DWM 执行

insert into table itcast_dwm.itcast_intention_dwm partition (yearinfo,monthinfo,dayinfo)
SELECT
    iid.customer_id,
    iid.create_date_time,
    c.area AS area,
    iid.itcast_school_id,
    ic.name AS itcast_school_name,
    iid.deleted,
    iid.origin_type,
    iid.itcast_subject_id,
    iss.name AS itcast_subject_name,
    iid.hourinfo,
    iid.origin_type_stat,
    IF (cc.clue_state = 'VALID_NEW_CLUES', 1, IF(cc.clue_state = 'VALID_PUBLIC_NEW_CLUE', 0, -1)) AS clue_state_stat,
    e.tdepart_id,
    sd.name AS tdepart_name,
    iid.yearinfo,
    iid.monthinfo,
    iid.dayinfo
FROM itcast_ods.customer_clue AS cc
LEFT JOIN itcast_dwd.itcast_intention_dwd AS iid ON cc.customer_relationship_id = iid.rid
LEFT JOIN itcast_dimen.customer AS c ON iid.customer_id = c.id
LEFT JOIN itcast_dimen.itcast_school AS ic ON iid.itcast_school_id = ic.id
LEFT JOIN itcast_dimen.itcast_subject AS iss ON iid.itcast_subject_id = iss.id
LEFT JOIN itcast_dimen.employee AS e ON iid.creator = e.id
LEFT JOIN itcast_dimen.scrm_department AS sd ON e.tdepart_id = sd.id
-- 可选的WHERE,如果跑步下来,可以按照年逐年去跑
WHERE iid.yearinfo='2011';

-- 年的列表可以在MySQL中执行:
select COUNT(*), YEAR(create_date_time) from customer_relationship GROUP BY YEAR(create_date_time);
DWM -> DWS

DWS建表

指标:客户意向数量

维度:

- 时间维度
- 线上线下
- 新老客户
- 区域维度
- 学校维度
- 学科维度
- 部门维度
- 来源渠道维度

total_customer_number, 年,月,日,线上线下,新老客户,区域,学校,学科,部门,来源,时间类型,维度类型

drop Table itcast_dws.itcast_intention_dws;
CREATE TABLE IF NOT EXISTS itcast_dws.itcast_intention_dws (
   `customer_total` INT COMMENT '聚合意向客户数',
   `area` STRING COMMENT '区域信息',
   `itcast_school_id` STRING COMMENT '校区id',
   `itcast_school_name` STRING COMMENT '校区名称',
   `origin_type` STRING COMMENT '来源渠道',
   `itcast_subject_id` STRING COMMENT '学科id',
   `itcast_subject_name` STRING COMMENT '学科名称',
   `hourinfo` STRING COMMENT '小时信息',
   `origin_type_stat` STRING COMMENT '数据来源:0.线下;1.线上',
   `clue_state_stat` STRING COMMENT '客户属性:0.老客户;1.新客户',
   `tdepart_id` STRING COMMENT '创建者部门id',
   `tdepart_name` STRING COMMENT '咨询中心名称',
   `time_str` STRING COMMENT '时间明细',
   `groupType` STRING COMMENT '产品属性类别:1.总意向量;2.区域信息;3.校区、学科组合分组;4.来源渠道;5.贡献中心;',
   `time_type` STRING COMMENT '时间维度:1、按小时聚合;2、按天聚合;3、按周聚合;4、按月聚合;5、按年聚合;'
)
comment '客户意向dws表'
PARTITIONED BY(yearinfo STRING,monthinfo STRING,dayinfo STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as orc
TBLPROPERTIES ('orc.compress'='SNAPPY');

分析SQL

纯时间维度的总意向量分析

-- 纯时间维度-年月日
-- 年
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	yearninfo AS time_str,
	'1' AS grouptype,
	'5' AS time_type,
	yearinfo,
	'-1' AS monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat;

-- 月
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo) AS time_str,
	'1' AS grouptype,
	'4' AS time_type,
	yearinfo,
	monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat;

-- 天
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo) AS time_str,
	'1' AS grouptype,
	'2' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat;

-- 小时
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo, ' ', hourinfo) AS time_str,
	'1' AS grouptype,
	'1' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat;

时间维度+区域维度分析

-- 年
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	yearninfo AS time_str,
	'2' AS grouptype,
	'5' AS time_type,
	yearinfo,
	'-1' AS monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat,area;

-- 月
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo) AS time_str,
	'2' AS grouptype,
	'4' AS time_type,
	yearinfo,
	monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat,area;

-- 天
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo) AS time_str,
	'2' AS grouptype,
	'2' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat,area;

-- 小时
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo, ' ', hourinfo) AS time_str,
	'2' AS grouptype,
	'1' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat,area;

时间维度+校区学科维度

-- 年
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	itcast_school_id,
	itcast_school_name,
	'-1' AS origin_type,
	itcast_subject_id,
	itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	yearninfo AS time_str,
	'3' AS grouptype,
	'5' AS time_type,
	yearinfo,
	'-1' AS monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat,itcast_school_id,itcast_school_name,itcast_subject_id,itcast_subject_name;

-- 月
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	itcast_school_id,
	itcast_school_name,
	'-1' AS origin_type,
	itcast_subject_id,
	itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo) AS time_str,
	'3' AS grouptype,
	'4' AS time_type,
	yearinfo,
	monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat,itcast_school_id,itcast_school_name,itcast_subject_id,itcast_subject_name;

-- 天
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	itcast_school_id,
	itcast_school_name,
	'-1' AS origin_type,
	itcast_subject_id,
	itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo) AS time_str,
	'3' AS grouptype,
	'2' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat,itcast_school_id,itcast_school_name,itcast_subject_id,itcast_subject_name;

-- 小时
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	area,
	itcast_school_id,
	itcast_school_name,
	'-1' AS origin_type,
	itcast_subject_id,
	itcast_subject_name,
	hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo, ' ', hourinfo) AS time_str,
	'3' AS grouptype,
	'1' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat,itcast_school_id,itcast_school_name,itcast_subject_id,itcast_subject_name;

时间维度+来源渠道维度

-- 年
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	yearninfo AS time_str,
	'4' AS grouptype,
	'5' AS time_type,
	yearinfo,
	'-1' AS monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat,origin_type;

-- 月
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo) AS time_str,
	'4' AS grouptype,
	'4' AS time_type,
	yearinfo,
	monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat,origin_type;

-- 天
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo) AS time_str,
	'4' AS grouptype,
	'2' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat,origin_type;

-- 小时
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	hourinfo,
	origin_type_stat,
	clue_state_stat,
	'-1' AS tdepart_id,
	'-1' AS tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo, ' ', hourinfo) AS time_str,
	'4' AS grouptype,
	'1' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat,origin_type;

时间维度+贡献中心维度

-- 年
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	tdepart_id,
	tdepart_name,
	yearninfo AS time_str,
	'5' AS grouptype,
	'5' AS time_type,
	yearinfo,
	'-1' AS monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

-- 月
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	tdepart_id,
	tdepart_name,
	CONCAT(yearinfo, '-', monthinfo) AS time_str,
	'5' AS grouptype,
	'4' AS time_type,
	yearinfo,
	monthinfo,
	'-1' AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

-- 天
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	'-1' AS hourinfo,
	origin_type_stat,
	clue_state_stat,
	tdepart_id,
	tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo) AS time_str,
	'5' AS grouptype,
	'2' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

-- 小时
SELECT
	COUNT(DISTINCT customer_id) AS customer_total,
	'-1' AS area,
	'-1' AS itcast_school_id,
	'-1' AS itcast_school_name,
	'-1' AS origin_type,
	'-1' AS itcast_subject_id,
	'-1' AS itcast_subject_name,
	hourinfo,
	origin_type_stat,
	clue_state_stat,
	tdepart_id,
	tdepart_name,
	CONCAT(yearinfo, '-', monthinfo, '-', dayinfo, ' ', hourinfo) AS time_str,
	'5' AS grouptype,
	'1' AS time_type,
	yearinfo,
	monthinfo,
	dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;
DWS -> MySQL

origin_type_stat,
clue_state_stat,
tdepart_id,
tdepart_name,
yearninfo AS time_str,
'5' AS grouptype,
'5' AS time_type,
yearinfo,
'-1' AS monthinfo,
'-1' AS dayinfo

FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

– 月
SELECT
COUNT(DISTINCT customer_id) AS customer_total,
‘-1’ AS area,
‘-1’ AS itcast_school_id,
‘-1’ AS itcast_school_name,
‘-1’ AS origin_type,
‘-1’ AS itcast_subject_id,
‘-1’ AS itcast_subject_name,
‘-1’ AS hourinfo,
origin_type_stat,
clue_state_stat,
tdepart_id,
tdepart_name,
CONCAT(yearinfo, ‘-’, monthinfo) AS time_str,
‘5’ AS grouptype,
‘4’ AS time_type,
yearinfo,
monthinfo,
‘-1’ AS dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

– 天
SELECT
COUNT(DISTINCT customer_id) AS customer_total,
‘-1’ AS area,
‘-1’ AS itcast_school_id,
‘-1’ AS itcast_school_name,
‘-1’ AS origin_type,
‘-1’ AS itcast_subject_id,
‘-1’ AS itcast_subject_name,
‘-1’ AS hourinfo,
origin_type_stat,
clue_state_stat,
tdepart_id,
tdepart_name,
CONCAT(yearinfo, ‘-’, monthinfo, ‘-’, dayinfo) AS time_str,
‘5’ AS grouptype,
‘2’ AS time_type,
yearinfo,
monthinfo,
dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;

– 小时
SELECT
COUNT(DISTINCT customer_id) AS customer_total,
‘-1’ AS area,
‘-1’ AS itcast_school_id,
‘-1’ AS itcast_school_name,
‘-1’ AS origin_type,
‘-1’ AS itcast_subject_id,
‘-1’ AS itcast_subject_name,
hourinfo,
origin_type_stat,
clue_state_stat,
tdepart_id,
tdepart_name,
CONCAT(yearinfo, ‘-’, monthinfo, ‘-’, dayinfo, ’ ', hourinfo) AS time_str,
‘5’ AS grouptype,
‘1’ AS time_type,
yearinfo,
monthinfo,
dayinfo
FROM itcast_dwm.itcast_intention_dwm
GROUP BY yearinfo,monthinfo,dayinfo,hourinfo,origin_type_stat,clue_state_stat,tdepart_id,tdepart_name;






# DWS -> MySQL

略













上一篇:MySQL基础知识:创建MySQL数据库和表


下一篇:520前,我放弃陪女朋友时间,*写代码:“SSM框架整合+excel文件上传到数据库+数据更新“