报名用户需求分析

看板4需求分析模板

报名用户需求分析
不知道该选取什么字段
dwd dim层 orc snappy
报名用户需求分析

前言

和前面的看板分析的方法一致。

讲义给出的叫做:需求文档

报名用户需求分析

如图,我们要做的是:根据需求的要求,整理出指标和维度

BI看板展示:

报名用户需求分析

需求分析

2.1 校区报名柱状图
说明:统计期内,全部报名客户中,各校区报名人数分布。
展现:柱状图
条件:年、月,校区
维度:天区间,按查询条件来定
指标:报名人数
粒度:天/线上线下/校区
数据来源:客户管理系统的customer_relationship、itcast_clazz报名课程表

2.2 学科报名柱状图
说明:统计期内,全部报名客户中,各学科报名人数分布。
展现:柱状图
条件:年、月,学科
维度:天区间,按查询条件来定
指标:报名人数
粒度:天/线上线下/学科
数据来源:客户管理系统的customer_relationship、itcast_clazz报名课程表

2.3 总报名量
说明:统计期内,已经缴费的报名客户总量。
展现:数值。
条件:年、月
维度:年、月
指标:报名客户总量
粒度:天
数据来源:客户管理系统的customer_relationship表

2.4 线上报名量
说明:总报名量中来源渠道为线*客渠道的报名总量
展现:线状图。
条件:年、月
维度:天区间,按查询条件来定
指标:报名客户总量
粒度:天
数据来源:客户管理系统的customer_relationship表

2.5 意向用户报名转化率
说明:统计期内,新增的意向客户中报名的客户占比。全部报名人数 / 全部新增的意向人数
展现:线状图。双轴:全部报名人数、报名转化率。
条件:年、月
维度:天/线上线下
指标:报名转化率=全部报名人数/全部新增的意向人数
粒度:天
数据来源:客户管理系统的customer_relationship表

2.6 有效线索报名转化率
说明:线上报名量 / 线上有效线索量,与上一个指标类似,此处的线索量需要排除已申诉数据。
展现:线状图。双轴:线上报名人数、线上报名转化率。
条件:年、月
维度:天/线上线下
指标:线上报名转化率=线上报名人数/线上有效线索量
粒度:天
数据来源:客户管理系统的customer_relationship表、customer_clue表、customer_appeal表

2.7 日报名趋势图
说明:统计期内,每天报名人数的趋势图。
展现:线状图。
条件:年、月
维度:天/线上线下
指标:报名人数
粒度:天
数据来源:客户管理系统的customer_relationship表

2.8 校区学科的报名学员TOP
说明:统计期内,全部报名学员中,校区学科排行榜,topN。A校区b学科第一,B校区a学科第二等等。
展现:柱状图
条件:年、月,校区,学科,数据量N
维度:天/线上线下
指标:报名学员人数
粒度:各校区各学科的报名人数和
数据来源:客户管理系统的customer_relationship表、itcast_clazz表

2.9 来源渠道占比
说明:统计期内,全部报名学员中,不同来源渠道的报名学员占比情况。
展现:饼状图
条件:年、月
维度:天/线上线下/来源渠道
指标:比值
数据来源:客户管理系统的customer_relationship表

2.10 咨询中心报名贡献
说明:统计期内,全部报名学员中,各咨询中心的报名学员人数占比情况。
展现:饼状图
条件:年、月,咨询中心
维度:天/线上线下/咨询中心
指标:报名学员人数
粒度:天/报名学员人数
数据来源:客户管理系统的customer_relationship表、employee表、scrm_department表

需求总结

指标

报名用户量,
意向客户量,
有效线索量

维度

时间:
年月日时
空间:
地区
线上线下
属性:
渠道
学科

涉及表和字段

事实表:
customer_relationship表

维度表:
itcast_clazz表
employee、scrm_department表

关联条件

id
date_day

注意事项

建模模板

提示:建模要考虑到全量和增量。

特别是增量,以哪个字段来作为增量依据。

各个层的分区、分桶如何做来兼容增量采集。

ODS DIM 分析

ODS表

几个表:
4个表:
意向用户表
报名课程表
员工表
员工部门表

存储格式:
orc
压缩格式?
zlib
是否分区、分桶:
需要分桶

是否索引
需要
布隆索引

一个表不用添加

DIM层表

当前看板是否需要采集表到DIM层
需要

如有,如何存储,是否压缩,是否分区、是否分桶,是否索引?
orc存储
snappy压缩
起始时间分区
布隆索引

DWD层

ODS -> DWD 执行数据抽取、转换、清洗

抽取哪些字段?
id int COMMENT ‘客户关系id’,
customer_id int COMMENT ‘所属客户id’,
origin_type STRING COMMENT ‘数据来源’,
payment_time STRING COMMENT ‘支付状态变动时间字符串’,
payment_time_hour STRING COMMENT ‘支付状态变动小时’,
itcast_clazz_id int COMMENT ‘报名课程id’,
creator int COMMENT ‘创建人id’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’

哪些字段需要转换?
id int COMMENT ‘客户关系id’,
customer_id int COMMENT ‘所属客户id’,
origin_type STRING COMMENT ‘数据来源’,
payment_time STRING COMMENT ‘支付状态变动时间字符串’,
payment_time_hour STRING COMMENT ‘支付状态变动小时’,
itcast_clazz_id int COMMENT ‘报名课程id’,
creator int COMMENT ‘创建人id’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’

哪些数据需要清洗?
id int COMMENT ‘客户关系id’,
customer_id int COMMENT ‘所属客户id’,
origin_type STRING COMMENT ‘数据来源’,
payment_time STRING COMMENT ‘支付状态变动时间字符串’,
payment_time_hour STRING COMMENT ‘支付状态变动小时’,
itcast_clazz_id int COMMENT ‘报名课程id’,
creator int COMMENT ‘创建人id’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’

DWM层

DWM层做 预聚合和维度退化

是否需要DWM层?

是否做维度退化?

   customer_id 报名客户id
 `itcast_school_id` '学校id',

itcast_school_name ‘学校namne’,
itcast_subject_id ‘学科id’,
itcast_subject_name ‘学科name’,
tdepart_id ‘咨询中心id’,
origin_type ‘来源渠道’,
‘origin_type_stat’ ‘数据来源:0.线下;1.线上’,
payment_time_hour ‘最后更新时间’

DWS层

几个表?
1个

如何设计表?

signup_num ‘报名人数’,
itcast_school_id ‘学校id’,
itcast_school_name ‘学校namne’,
itcast_subject_id ‘学科id’,
itcast_subject_name ‘学科name’,
tdepart_id ‘咨询中心id’,
tdepart_name ‘咨询中心name’,
origin_type ‘来源渠道’,
origin_type_stat ‘数据来源:0.线下;1.线上’,
payment_time_hour ‘最后更新时间’,
groupType ‘业务分组类型:1:校区、学科组合分组;2:来源渠道分组;3:咨询中心分组;4:所有’,
time_type ‘聚合时间类型:1、按小时聚合;2、按天聚合;3、按周聚合;4、按月聚合;5、按年聚合。’

操作模板

建表语句

ODS建表语句

ODS层的customer_relationship表可以直接复用。
itcast_clazz表是维表,放在维表层。

DIM建表语句

itcast_clazz报名课程表

CREATE EXTERNAL TABLE IF NOT EXISTS itcast_dimen.itcast_clazz (
id int COMMENT ‘ems课程id(非自增)’,
create_date_time STRING COMMENT ‘创建时间’,
update_date_time STRING COMMENT ‘最后更新时间’,
deleted STRING COMMENT ‘是否被删除(禁用)’,
itcast_school_id STRING COMMENT ‘ems校区ID’,
itcast_school_name STRING COMMENT ‘ems校区名称’,
itcast_subject_id STRING COMMENT ‘ems学科ID’,
itcast_subject_name STRING COMMENT ‘ems学科名称’,
itcast_brand STRING COMMENT ‘ems品牌’,
clazz_type_state STRING COMMENT ‘班级类型状态’,
clazz_type_name STRING COMMENT ‘班级类型名称’,
teaching_mode STRING COMMENT ‘授课模式’,
start_time STRING COMMENT ‘开班时间’,
end_time STRING COMMENT ‘毕业时间’,
comment STRING COMMENT ‘备注’,
detail STRING COMMENT ‘详情(比如:27期)’,
uncertain STRING COMMENT ‘待定班(0:否,1:是)’,
tenant int COMMENT ‘租户’,
ends_time STRING COMMENT ‘有效时间’)
comment ‘班级信息表’
PARTITIONED BY(starts_time STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
stored as orc
TBLPROPERTIES (‘orc.compress’=‘SNAPPY’,‘orc.create.index’=‘true’,‘orc.bloom.filter.columns’=‘id’);

DWD建表语句

customer_relationship_dwd表

CREATE TABLE IF NOT EXISTS itcast_dwd.customer_relationship_dwd (
id int COMMENT ‘客户关系id’,
customer_id int COMMENT ‘所属客户id’,
origin_type STRING COMMENT ‘数据来源’,
payment_time STRING COMMENT ‘支付状态变动时间字符串’,
payment_time_hour STRING COMMENT ‘支付状态变动小时’,
itcast_clazz_id int COMMENT ‘报名课程id’,
creator int COMMENT ‘创建人id’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’ )
comment ‘客户关系表’
PARTITIONED BY (payment_time_year String, payment_time_month String, payment_time_day String)
CLUSTERED BY(id) sorted by(id) into 10 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
stored as orc
TBLPROPERTIES (‘orc.compress’=‘SNAPPY’,‘orc.create.index’=‘true’,‘orc.bloom.filter.columns’=‘itcast_clazz_id,creator’);

DWM建表语句

报名数据中间表
itcast_dwm.customer_signup_dwm表

CREATE TABLE IF NOT EXISTS itcast_dwm.customer_signup_dwm (
customer_id int COMMENT ‘报名客户id’,
itcast_school_id STRING COMMENT ‘学校id’,
itcast_school_name STRING COMMENT ‘学校namne’,
itcast_subject_id STRING COMMENT ‘学科id’,
itcast_subject_name STRING COMMENT ‘学科name’,
tdepart_id int COMMENT ‘咨询中心id’,
tdepart_name STRING COMMENT ‘咨询中心name’,
origin_type STRING COMMENT ‘来源渠道’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’,
payment_time_hour STRING COMMENT ‘最后更新时间’)
comment ‘报名人数统计中间表’
PARTITIONED BY (payment_time_year String, payment_time_month String, payment_time_day String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
stored as orc
TBLPROPERTIES (‘orc.compress’=‘SNAPPY’);

DWS建表语句

报名数据宽表
itcast_dws.customer_signup_dws表

drop table itcast_dws.customer_signup_dws;
create table if not exists itcast_dws.customer_signup_dws (
signup_num int COMMENT ‘报名人数’,
itcast_school_id STRING COMMENT ‘学校id’,
itcast_school_name STRING COMMENT ‘学校namne’,
itcast_subject_id STRING COMMENT ‘学科id’,
itcast_subject_name STRING COMMENT ‘学科name’,
tdepart_id int COMMENT ‘咨询中心id’,
tdepart_name STRING COMMENT ‘咨询中心name’,
origin_type STRING COMMENT ‘来源渠道’,
origin_type_stat STRING COMMENT ‘数据来源:0.线下;1.线上’,
payment_time_hour STRING COMMENT ‘最后更新时间’,
groupType STRING COMMENT ‘业务分组类型:1:校区、学科组合分组;2:来源渠道分组;3:咨询中心分组;4:所有’,
time_type STRING COMMENT ‘聚合时间类型:1、按小时聚合;2、按天聚合;3、按周聚合;4、按月聚合;5、按年聚合。’
)
comment ‘报名人数统计业务表’
PARTITIONED BY (payment_time_year String, payment_time_month String,payment_time_day String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
stored as orc
TBLPROPERTIES (‘orc.compress’=‘SNAPPY’);

全量采集执行脚本

业务数据库到ODS和DIM的SQOOP执行脚本

sqoop import
–connect jdbc:mysql://192.168.52.150:3306/scrm
–username root
–password 123456
–query ‘select *,FROM_UNIXTIME(unix_timestamp(),"%Y-%m-%d")as starts_time,date_format(“9999-12-31”,"%Y-%m-%d") as ends_time from itcast_clazz where $CONDITIONS’
–hcatalog-database itcast_dimen
–hcatalog-table itcast_clazz
-m 100
–split-by id

ODS到DWD的HQL语句

–分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
–hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
–写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;
–分桶
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
–并行执行
set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=8;
–矢量化查询
set hive.vectorized.execution.enabled=true;
–读取零拷贝
set hive.exec.orc.zerocopy=true;

INSERT INTO itcast_dwd.customer_relationship_dwd PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
id,
customer_id,
nvl(origin_type, -1) origin_type,
payment_time,
substr(payment_time, 12, 2) as payment_time_hour,
nvl(itcast_clazz_id, -1) itcast_clazz_id,
nvl(creator, -1) creator,
if(origin_type=‘NETSERVICE’ or origin_type=‘PRESIGNUP’, ‘1’, ‘0’) origin_type_stat,
substr(payment_time, 1, 4) payment_time_year,
substr(payment_time, 6, 2) payment_time_month,
substr(payment_time, 9, 2) payment_time_day
from itcast_ods.customer_relationship
WHERE deleted = 0 AND customer_id IS NOT NULL AND payment_state=‘PAID’;

DWD到DWM的HQL语句

–分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
–hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
–写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;
–分桶
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
–并行执行
set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=8;
–矢量化查询
set hive.vectorized.execution.enabled=true;
–关联优化器
set hive.optimize.correlation=true;
–读取零拷贝
set hive.exec.orc.zerocopy=true;
–join数据倾斜
set hive.optimize.skewjoin=true;
– set hive.skewjoin.key=100000;
set hive.optimize.skewjoin.compiletime=true;
set hive.optimize.union.remove=true;

INSERT into itcast_dwm.customer_signup_dwm PARTITION (payment_time_year, payment_time_month, payment_time_day)
SELECT
dwd.customer_id,
clazz.itcast_school_id,
clazz.itcast_school_name,
clazz.itcast_subject_id,
clazz.itcast_subject_name,
e.tdepart_id,
dept.name as tdepart_name,
dwd.origin_type,
dwd.origin_type_stat,
dwd.payment_time_hour,
dwd.payment_time_year,
dwd.payment_time_month,
dwd.payment_time_day
FROM itcast_dwd.customer_relationship_dwd dwd
LEFT JOIN itcast_dimen.itcast_clazz clazz on dwd.itcast_clazz_id=clazz.id
LEFT JOIN itcast_dimen.employee e on dwd.creator=e.id
LEFT JOIN itcast_dimen.scrm_department dept on e.tdepart_id=dept.id;

DWM到DWS的HQL语句

```–*校区、学科组合分组–**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)SELECTcount(dwm.customer_id) as signup_num,dwm.itcast_school_id,dwm.itcast_school_name,dwm.itcast_subject_id,dwm.itcast_subject_name,-1 as tdepart_id,’-1’ tdepart_name,’-1’ origin_type,dwm.origin_type_stat,dwm.payment_time_hour,‘1’ grouptype,‘1’ as time_type,dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_dayfrom itcast_dwm.customer_signup_dwm dwmGROUP BY dwm.origin_type_stat,dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;`

*--**天*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
*count*(dwm.customer_id) as signup_num,
dwm.itcast_school_id,
dwm.itcast_school_name,
dwm.itcast_subject_id,
dwm.itcast_subject_name,
-1 as tdepart_id,
'-1' tdepart_name,
'-1' origin_type,
dwm.origin_type_stat,
'-1' payment_time_hour,
'1' grouptype,
'2' as time_type,
dwm.payment_time_year,
dwm.payment_time_month,
dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
GROUP BY dwm.origin_type_stat,
dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day,
dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;

*--**月*

INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
*count*(dwm.customer_id) as signup_num,
dwm.itcast_school_id,
dwm.itcast_school_name,
dwm.itcast_subject_id,
dwm.itcast_subject_name,
-1 as tdepart_id,
'-1' tdepart_name,
'-1' origin_type,
dwm.origin_type_stat,
'-1' payment_time_hour,
'1' grouptype,
'3' as time_type,
dwm.payment_time_year,
dwm.payment_time_month,
dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
GROUP BY dwm.origin_type_stat,
dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day,
dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;


`*--**年*`

`INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)`
`SELECT`
    `*count*(dwm.customer_id) as signup_num,`
    `dwm.itcast_school_id,`
    `dwm.itcast_school_name,`
    `dwm.itcast_subject_id,`
    `dwm.itcast_subject_name,`
    `-1 as tdepart_id,`
    `'-1' tdepart_name,`
    `'-1' origin_type,`
    `dwm.origin_type_stat,`
    `'-1' payment_time_hour,`
    `'1' grouptype,` 
    `'4' as time_type,`
    `dwm.payment_time_year,`
    `dwm.payment_time_month,`
    `dwm.payment_time_day`
`from itcast_dwm.customer_signup_dwm dwm`
`GROUP BY dwm.origin_type_stat,`
    `dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day,` 
    `dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;`

`*--**来源渠道分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)`
`SELECT`
    `*count*(dwm.customer_id) as signup_num,`
    `'-1' itcast_school_id,`
    `'-1' itcast_school_name,`
    `'-1' itcast_subject_id,`
    `'-1' itcast_subject_name,`
    `-1 as tdepart_id,`
    `'-1' tdepart_name,`
    `dwm.origin_type,` 
    `'1' as time_type,`
    `dwm.origin_type_stat,`
    `dwm.payment_time_hour,`
    `'2' grouptype,`
    `dwm.payment_time_year,`
    `dwm.payment_time_month,`
    `dwm.payment_time_day`
`from itcast_dwm.customer_signup_dwm dwm`
`GROUP BY dwm.origin_type_stat,`
    `dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,`
    `**dwm.origin_type**;`
`*--**天、月、年省略*`



`*--**咨询中心分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)`
`SELECT`
    `*count*(dwm.customer_id) as signup_num,`
    `'-1' itcast_school_id,`
    `'-1' itcast_school_name,`
    `'-1' itcast_subject_id,`
    `'-1' itcast_subject_name,`
    `dwm.tdepart_id,`
    `dwm.tdepart_name,`
    `'-1' origin_type,`
    `dwm.origin_type_stat,`
    `dwm.payment_time_hour,`
    `'3' grouptype,` 
    `'1' as time_type,`
    `dwm.payment_time_year,`
    `dwm.payment_time_month,`
    `dwm.payment_time_day`
`from itcast_dwm.customer_signup_dwm dwm`
`GROUP BY dwm.origin_type_stat,`
    `dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,`
    `**dwm.tdepart_id, dwm.tdepart_name**;`
`*--**天、月、年省略*`



`*--**总数分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)`
`SELECT`
    `*count*(dwm.customer_id) as signup_num,`
    `'-1' itcast_school_id,`
    `'-1' itcast_school_name,`
    `'-1' itcast_subject_id,`
    `'-1' itcast_subject_name,`
    `-1 as tdepart_id,`
    `'-1' tdepart_name,`
    `'-1' origin_type,`
    `dwm.origin_type_stat,`
    `dwm.payment_time_hour,`
    `'4' grouptype,` 
    `'1' as time_type,`
    `dwm.payment_time_year,`
    `dwm.payment_time_month,`
    `dwm.payment_time_day`
`from itcast_dwm.customer_signup_dwm dwm`
`GROUP BY dwm.origin_type_stat,`
    `dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour;`

`*--**天、月、年省略**……*`





### 全量导出脚本

CREATE TABLE customer_signup_app (
   signup_num int(11) COMMENT '报名人数',
   itcast_school_id varchar(32) COMMENT '学校id',
   itcast_school_name varchar(32) COMMENT '学校name',
   itcast_subject_id varchar(32) COMMENT '学科id',
   itcast_subject_name varchar(32) COMMENT '学科name',
   tdepart_id int(11) COMMENT '咨询中心id',
   tdepart_name varchar(32) COMMENT '咨询中心name',
   origin_type varchar(32) COMMENT '来源渠道',
   origin_type_stat varchar(32) COMMENT '数据来源:0.线下;1.线上',
   payment_time_month varchar(32) COMMENT '月信息',
   payment_time_day varchar(32) COMMENT '日信息',
   payment_time_hour varchar(32) COMMENT '最后更新时间',
   groupType varchar(32) COMMENT '统计分组类型:1:校区、学科组合分组;2:来源渠道分组;3:咨询中心分组;4:所有',
   time_type varchar(32) COMMENT '聚合时间类型:1、按小时聚合(;2、按天聚合;3、按周聚合;4、按月聚合;5、按年聚合。',
   payment_time_year varchar(32) COMMENT '年信息'
);



sqoop export \

--connect "jdbc:mysql://192.168.52.150:3306/scrm_bi?useUnicode=true&characterEncoding=utf-8" \

--username root \

--password 123456 \

--table customer_signup_app \

--hcatalog-database itcast_dws \

--hcatalog-table customer_signup_dws \

-m 100







## 增量采集执行脚本

*`--**分区*SET hive.exec.dynamic.partition=true;`
`SET hive.exec.dynamic.partition.mode=nonstrict;`
`set hive.exec.max.dynamic.partitions.pernode=10000;`
`set hive.exec.max.dynamic.partitions=100000;`
`set hive.exec.max.created.files=150000;`
`*--hive**压缩*set hive.exec.compress.intermediate=true;`
`set hive.exec.compress.output=true;`
`*--**写入时压缩生效*set hive.exec.orc.compression.strategy=COMPRESSION;`
`*--**分桶*set hive.enforce.bucketing=true;`
`set hive.enforce.sorting=true;`
`set hive.optimize.bucketmapjoin = true;`
`set hive.auto.convert.sortmerge.join=true;`
`set hive.auto.convert.sortmerge.join.noconditionaltask=true;`
`*--**并行执行*set hive.exec.parallel=true;`
`set hive.exec.parallel.thread.number=8;*--**矢量化查询*set hive.vectorized.execution.enabled=true;`
`*--**关联优化器*set hive.optimize.correlation=true;`
`*--**读取零拷贝*set hive.exec.orc.zerocopy=true;`

`INSERT INTO itcast_dwd.customer_relationship_dwd PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    id,
    deleted,
    customer_id,
    *nvl*(origin_type, -1) origin_type,
    payment_state,
    payment_time,
    *substr*(payment_time, 12, 2) as payment_time_hour,
    *nvl*(itcast_clazz_id, -1) itcast_clazz_id,
    *nvl*(creator, -1) creator,
    *if*(origin_type='NETSERVICE' or origin_type='PRESIGNUP', '1', '0') origin_type_stat,
    *substr*(payment_time, 1, 4) payment_time_year,
    *substr*(payment_time, 6, 2) payment_time_month,
    *substr*(payment_time, 9, 2) payment_time_day
from itcast_ods.customer_relationship
WHERE deleted = 0 AND customer_id IS NOT NULL AND payment_state='PAID'
and *substr*(payment_time, 1, 10) >='2037-02-01';

### 业务数据库到ODS和DIM的SQOOP执行脚本



### ODS到DWD的SQOOP脚本

``\#! /bin/bash````

````````

`\#上个月1日`

`Last_Month_DATE=$(date -d "$(date +%Y%m)01 last month" +%Y-%m-01)`

````````

`out_put=$(${HIVE_HOME} -S -e "`

`--分区`

`SET hive.exec.dynamic.partition=true;`

`SET hive.exec.dynamic.partition.mode=nonstrict;`

`set hive.exec.max.dynamic.partitions.pernode=10000;`

`set hive.exec.max.dynamic.partitions=100000;`

`set hive.exec.max.created.files=150000;`

`--hive压缩`

`set hive.exec.compress.intermediate=true;`

`set hive.exec.compress.output=true;`

`--写入时压缩生效`

`set hive.exec.orc.compression.strategy=COMPRESSION;`

`--分桶`

`set hive.enforce.bucketing=true;`

`set hive.enforce.sorting=true;`

`set hive.optimize.bucketmapjoin = true;`

`set hive.auto.convert.sortmerge.join=true;`

`set hive.auto.convert.sortmerge.join.noconditionaltask=true;`

`--并行执行`

`set hive.exec.parallel=true;`

`set hive.exec.parallel.thread.number=8;`

`--矢量化查询`

`set hive.vectorized.execution.enabled=true;`

`--关联优化器`

`set hive.optimize.correlation=true;`

`--读取零拷贝`

`set hive.exec.orc.zerocopy=true;`

````````

`INSERT INTO itcast_dwd.customer_relationship_dwd PARTITION(payment_time_year, payment_time_month, payment_time_day)`

`SELECT`

​    `id,`

​    `deleted,`

​    `customer_id,`

​    `nvl(origin_type, -1) origin_type,`

​    `payment_state,`

​    `payment_time,`

​    `substr(payment_time, 12, 2) as payment_time_hour,`

​    `nvl(itcast_clazz_id, -1) itcast_clazz_id,`

​    `nvl(creator, -1) creator,`

​    `if(origin_type='NETSERVICE' or origin_type='PRESIGNUP', '1', '0') origin_type_stat,`

​    `substr(payment_time, 1, 4) payment_time_year,`

​    `substr(payment_time, 6, 2) payment_time_month,`

​    `substr(payment_time, 9, 2) payment_time_day`

`from itcast_ods.customer_relationship`

`WHERE deleted = 0 AND customer_id IS NOT NULL AND payment_state='PAID'`

`and substr(payment_time, 1, 10) >='${Last_Month_DATE}';`

`")`

### DWD到DWM的SQOOP脚本

### SQL

*--**分区*SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
*--hive**压缩*set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
*--**写入时压缩生效*set hive.exec.orc.compression.strategy=COMPRESSION;
*--**分桶*set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
*--**并行执行*set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=8;*--**矢量化查询*set hive.vectorized.execution.enabled=true;
*--**关联优化器*set hive.optimize.correlation=true;
*--**读取零拷贝*set hive.exec.orc.zerocopy=true;
*--join**数据倾斜*set hive.optimize.skewjoin=true;
*-- set hive.skewjoin.key=100000;*set hive.optimize.skewjoin.compiletime=true;
set hive.optimize.union.remove=true;


INSERT into itcast_dwm.customer_signup_dwm PARTITION (payment_time_year, payment_time_month, payment_time_day)
SELECT
    dwd.customer_id,
    clazz.itcast_school_id,
    clazz.itcast_school_name,
    clazz.itcast_subject_id,
    clazz.itcast_subject_name,
    e.tdepart_id,
    dept.name as tdepart_name,
    dwd.origin_type,
    dwd.origin_type_stat,
    dwd.payment_time_hour,
    dwd.payment_time_year,
    dwd.payment_time_month,
    dwd.payment_time_day
FROM itcast_dwd.customer_relationship_dwd dwd
LEFT JOIN itcast_dimen.itcast_clazz clazz on dwd.itcast_clazz_id=clazz.id
LEFT JOIN itcast_dimen.employee e on dwd.creator=e.id
LEFT JOIN itcast_dimen.scrm_department dept on e.tdepart_id=dept.id
where *substr*(dwd.payment_time, 1, 10) >= '2037-02-01' ;







### Shell脚本

\#! /bin/bash

HIVE_HOME=/usr/bin/hive

 

Last_Month=$(date -d "$(date +%Y%m)01 last month" +%Y-%m-01)

${HIVE_HOME} -S -e "

--分区

SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.max.dynamic.partitions.pernode=10000;

set hive.exec.max.dynamic.partitions=100000;

set hive.exec.max.created.files=150000;

--hive压缩

set hive.exec.compress.intermediate=true;

set hive.exec.compress.output=true;

--写入时压缩生效

set hive.exec.orc.compression.strategy=COMPRESSION;

--分桶

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.optimize.bucketmapjoin = true;

set hive.auto.convert.sortmerge.join=true;

set hive.auto.convert.sortmerge.join.noconditionaltask=true;

--并行执行

set hive.exec.parallel=true;

set hive.exec.parallel.thread.number=8;

--矢量化查询

set hive.vectorized.execution.enabled=true;

--关联优化器

set hive.optimize.correlation=true;

--读取零拷贝

set hive.exec.orc.zerocopy=true;

--join数据倾斜

set hive.optimize.skewjoin=true;

-- set hive.skewjoin.key=100000;

set hive.optimize.skewjoin.compiletime=true;

set hive.optimize.union.remove=true;

 

INSERT into itcast_dwm.customer_signup_dwm PARTITION (payment_time_year, payment_time_month, payment_time_day)

SELECT

​    dwd.customer_id,

​    clazz.itcast_school_id,

​    clazz.itcast_school_name,

​    clazz.itcast_subject_id,

​    clazz.itcast_subject_name,

​    e.tdepart_id,

​    dept.name as tdepart_name,

​    dwd.origin_type,

​    dwd.origin_type_stat,

​    dwd.payment_time_hour,

​    dwd.payment_time_year,

​    dwd.payment_time_month,

​    dwd.payment_time_day

FROM itcast_dwd.customer_relationship_dwd dwd

LEFT JOIN itcast_dimen.itcast_clazz clazz on dwd.itcast_clazz_id=clazz.id

LEFT JOIN itcast_dimen.employee e on dwd.creator=e.id

LEFT JOIN itcast_dimen.scrm_department dept on e.tdepart_id=dept.id

where substr(dwd.payment_time, 1, 10) >= '$Last_Month' ;

"



`有如DWM就写,没有就略`



### DWM到DWS的SQOOP脚本

### SQL

*--**分区*SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
*--hive**压缩*set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
*--**写入时压缩生效*set hive.exec.orc.compression.strategy=COMPRESSION;
*--**分桶*set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.optimize.bucketmapjoin = true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
*--**并行执行*set hive.exec.parallel=true;
set hive.exec.parallel.thread.number=8;
*--**矢量化查询*set hive.vectorized.execution.enabled=true;
*--**关联优化器*set hive.optimize.correlation=true;
*--**读取零拷贝*set hive.exec.orc.zerocopy=true;
*--join**数据倾斜*set hive.optimize.skewjoin=true;
*-- set hive.skewjoin.key=100000;*set hive.optimize.skewjoin.compiletime=true;
set hive.optimize.union.remove=true;
*-- group**倾斜*set hive.groupby.skewindata=true;

*--**校区、学科组合分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    *count*(dwm.customer_id) as signup_num,
    dwm.itcast_school_id,
    dwm.itcast_school_name,
    dwm.itcast_subject_id,
    dwm.itcast_subject_name,
    -1 as tdepart_id,
    '-1' tdepart_name,
    '-1' origin_type,
    dwm.origin_type_stat,
    dwm.payment_time_hour,
    '1' grouptype, 
    '1' as time_type,
    dwm.payment_time_year,
    dwm.payment_time_month,
    dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
where *CONCAT_WS*('-',dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_day) >= '2037-02-01'
GROUP BY dwm.origin_type_stat,
    dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,
    dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;
*--**天*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    *count*(dwm.customer_id) as signup_num,
    dwm.itcast_school_id,
    dwm.itcast_school_name,
    dwm.itcast_subject_id,
    dwm.itcast_subject_name,
    -1 as tdepart_id,
    '-1' tdepart_name,
    '-1' origin_type,
    dwm.origin_type_stat,
    '-1' payment_time_hour,
    '1' grouptype, 
    '2' as time_type,
    dwm.payment_time_year,
    dwm.payment_time_month,
    dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
where *CONCAT_WS*('-',dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_day) >= '2037-02-01'
GROUP BY dwm.origin_type_stat,
    dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, 
    dwm.itcast_school_id, dwm.itcast_school_name, dwm.itcast_subject_id, dwm.itcast_subject_name;
*--**月、年省略**--**来源渠道分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    *count*(dwm.customer_id) as signup_num,
    '-1' itcast_school_id,
    '-1' itcast_school_name,
    '-1' itcast_subject_id,
    '-1' itcast_subject_name,
    -1 as tdepart_id,
    '-1' tdepart_name,
    dwm.origin_type,
    dwm.origin_type_stat,
    dwm.payment_time_hour,
    '2' grouptype, 
    '1' as time_type,
    dwm.payment_time_year,
    dwm.payment_time_month,
    dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
where *CONCAT_WS*('-',dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_day) >= '2037-02-01'
GROUP BY dwm.origin_type_stat,
    dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,
    dwm.origin_type;
*--**天、月、年省略**--**咨询中心分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    *count*(dwm.customer_id) as signup_num,
    '-1' itcast_school_id,
    '-1' itcast_school_name,
    '-1' itcast_subject_id,
    '-1' itcast_subject_name,
    dwm.tdepart_id,
    dwm.tdepart_name,
    '-1' origin_type,
    dwm.origin_type_stat,
    dwm.payment_time_hour,
    '3' grouptype, 
    '1' as time_type,
    dwm.payment_time_year,
    dwm.payment_time_month,
    dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
where *CONCAT_WS*('-',dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_day) >= '2037-02-01'
GROUP BY dwm.origin_type_stat,
    dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour,
    dwm.tdepart_id, dwm.tdepart_name;
*--**天、月、年省略**--**总数分组**--**小时*INSERT INTO itcast_dws.customer_signup_dws PARTITION(payment_time_year, payment_time_month, payment_time_day)
SELECT
    *count*(dwm.customer_id) as signup_num,
    '-1' itcast_school_id,
    '-1' itcast_school_name,
    '-1' itcast_subject_id,
    '-1' itcast_subject_name,
    -1 as tdepart_id,
    '-1' tdepart_name,
    '-1' origin_type,
    dwm.origin_type_stat,
    dwm.payment_time_hour,
    '4' grouptype, 
    '1' as time_type,
    dwm.payment_time_year,
    dwm.payment_time_month,
    dwm.payment_time_day
from itcast_dwm.customer_signup_dwm dwm
where *CONCAT_WS*('-',dwm.payment_time_year,dwm.payment_time_month,dwm.payment_time_day) >= '2037-02-01'
GROUP BY dwm.origin_type_stat,
    dwm.payment_time_year, dwm.payment_time_month, dwm.payment_time_day, dwm.payment_time_hour;
*--**天、月、年省略**……*



### Shell脚本

\`#! /bin/bash`

`HIVE_HOME=/usr/bin/hive`

`` 

`Last_Month=$(date -d "$(date +%Y%m)01 last month" +%Y-%m-01)`

`${HIVE_HOME} -S -e "`

`--分区`

`SET hive.exec.dynamic.partition=true;`

`SET hive.exec.dynamic.partition.mode=nonstrict;`

`set hive.exec.max.dynamic.partitions.pernode=10000;`

`set hive.exec.max.dynamic.partitions=100000;`

`set hive.exec.max.created.files=150000;`

`--hive压缩`

`set hive.exec.compress.intermediate=true;`

`set hive.exec.compress.output=true;`

`--写入时压缩生效`

`set hive.exec.orc.compression.strategy=COMPRESSION;`

`--分桶`

`set hive.enforce.bucketing=true;`

`set hive.enforce.sorting=true;`

`set hive.optimize.bucketmapjoin = true;`

`set hive.auto.convert.sortmerge.join=true;`

`set hive.auto.convert.sortmerge.join.noconditionaltask=true;`

`--并行执行`

`set hive.exec.parallel=true;`

`set hive.exec.parallel.thread.number=8;`

`--矢量化查询`

`set hive.vectorized.execution.enabled=true;`

`--关联优化器`

`set hive.optimize.correlation=true;`

`--读取零拷贝`

`set hive.exec.orc.zerocopy=true;`

`--join数据倾斜`

`set hive.optimize.skewjoin=true;`

`-- set hive.skewjoin.key=100000;`

`set hive.optimize.skewjoin.compiletime=true;`

`set hive.optimize.union.remove=true;`

`` 

`INSERT into itcast_dwm.customer_signup_dwm PARTITION (payment_time_year, payment_time_month, payment_time_day)`

`SELECT`

​    `dwd.customer_id,`

​    `clazz.itcast_school_id,`

​    `clazz.itcast_school_name,`

​    `clazz.itcast_subject_id,`

​    `clazz.itcast_subject_name,`

​    `e.tdepart_id,`

​    `dept.name as tdepart_name,`

​    `dwd.origin_type,`

​    `dwd.origin_type_stat,`

​    `dwd.payment_time_hour,`

​    `dwd.payment_time_year,`

​    `dwd.payment_time_month,`

​    `dwd.payment_time_day`

`FROM itcast_dwd.customer_relationship_dwd dwd`

`LEFT JOIN itcast_dimen.itcast_clazz clazz on dwd.itcast_clazz_id=clazz.id`

`LEFT JOIN itcast_dimen.employee e on dwd.creator=e.id`

`LEFT JOIN itcast_dimen.scrm_department dept on e.tdepart_id=dept.id`

`where substr(dwd.payment_time, 1, 10) >= '$Last_Month' ;`

`"`

##    **导出数据**

#####   **sqoop语句**





###   **shell脚本**















# 最后的总结

报名用户主题看板:

报名用户主题看板,该看板有十个需求。

我们先来进行维度和指标的分析,共有维度包括年月,线上线下。

校区报名柱状图,就是将不同校区的报名用户数据进行统计。指标是报名用户量,维度则是校区加上共有维度。同理,学科报名柱状图是学科+共有维度。校区学科报名学员TOP是统计各个校区各个学科的报名学员数量。在共有维度上加了学科和校区。

总报名量是统计所有报名用户数据。指标是报名用户量,维度是共有维度。线上报名量指的是总报名量的线上数据。

意向用户报名转化率,就等于全部报名人数/全部新增的意向人数。指标和维度之前的数据都有,可以直接复用。同理有效线索报名转化率也是能够直接拿来用。

剩下的日报名趋势图、来源渠道和咨询中心,指标都是报名人数,维度则是在共有维度基础上加上了天维度、来源渠道和咨询中心。

由此我们可以得出结论,这十个需求的共同指标是报名用户量、意向用户量和有效线索量。而意向用户量和有效线索量可以复用前面的看板数据。

接下来是建模分析,

首先在ODS层原始数据包括有customer_relationship(报名信息),itcast_clazz(报名后的校区和学科信息),employee(内部员工信息),scrm_department(部门信息)。

其次是在DWD层对数据进行清洗,抽取,转换,所以我们在DWD层清洗保留客户表中不为空的,且是已支付的数据,并且转换获得线上线下及年月日等字段。

再次是DWM层,在DWD层基础上,关联校区,学科和咨询中心表,来获取想要的字段。

最后DWS层按产品的属性维度进行统计,得到统计宽表,产品属性维度包括:校区,学科组合分组,来源渠道,咨询中心。

以上就是我的报名用户看板的看板分析,谢谢大家。

上一篇:Linux笔记一


下一篇:从0到1Flink的成长之路(二十)-Flink 高级特性(二)之状态恢复和重启策略