大数据之sparksql常用函数

1、桥接+子查询+交叉连接

select

  p.project_name as project_name

from

  ods_pms_t_project_work_1217 w

  left join ods_pms_t_project_user_1217 u on w.user_id = u.id

  and w.work_hour > 0

  left join (

    select

      t1.project_id as project_id,

      t1.project_name as project_name

    from

      ods_pms_t_project_1217 t1

  ) p on p.project_id = u.project_id

 

cross join ods_pms_t_project_user_1217 u1 on 1=1

 

2、when ... case语句

 SELECT CASE t1.STATUS WHEN 'Y' THEN t1.STATUS ELSE t2.city END

两种写法:

SELECT CASE name WHEN 'shanmin' THEN name when 'zhourong' then concat(name, '-',address) ELSE address END  as test111111 from liuyue_test;

SELECT CASE  WHEN name='shanmin' THEN name when name='zhourong' then concat(name, '-',address) ELSE address END  as test111111 from liuyue_test;

3、split语句

4、多级子查询

select t3.id from  (

select t2.id, t2.name,t2.address, t2.band, t2.sex, t2.number, t2.phone_num from  (

select t1.id, t1.name,t1.address, t1.band, t1.sex, t1.number, t1.phone_num from  

(select * from ods_alm_liuyue_test3) t1) t2) t3;

 

select  t1.name as name2 from  ods_alm_liuyue_test3  cross join (select  t0.sex from  ods_alm_liuyue_test3 t0 where  t0.name = 'Zhangsan' ) t1 on 1 = 1

 

4、join、cross join、inner join、full join是一样的?

5、cast语句:

select cast(cast(t1.id as decimal) as string) as typecast from ods_alm_liuyue_0329_1 t1

8、在会话中设置用户变量:

select @var_name :=  task_id from de_task_conf where task_name='acc_edw_dim_test_inre_slowly_051404';

之后在当前会话中就能引用var_name

9、往spark表中插入数据,需要mr账号执行权限:

set hive.exec.dynamic.partition.mode=nonstrict;

insert into   `zxvmax.edw_dim_test_incre_slowly_0816_backup` PARTITION (`p_date`) values('10000000012','10000000012','1012','zhourong_prod','Y','2019-07-16 10:35:26',

'2019-07-16 10:35:26','ods_alm_liuyue_test_0716','1900-01-01','3010-12-31','2019-07-16');

10、sparksql中使用时间、日期函数,及日期函数加减运算:

select now(),current_timestamp(),date_sub(current_date(),1),date_add(current_date(),1);

11、spark任务使用split函数会运行失败

可以用substr()、concat()替换。

12、在代码编辑中使用系统变量的日期运算

 ‘$cur_date#-1$’ $[date]-1

13、查看建表语句、表结构

desc table zxvmax.edw_test_change_part_09104;

Show create table zxvmax.edw_test_change_part_09104;

14、insert语句

MaxCompute的insert语法与通常使用的MySQL或Oracle的insert语法有差别,在insert overwrite/into后需要加入table关键字,而非直接使用tablename。insert into与insert overwrite的区别是:insert into会向表或表的分区中追加数据,而insert overwrite会在向表或分区中插入数据前清空表中的原有数据。

15、lead()函数

LEAD(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id  ORDER BY TO_DATE(scd_start_date)) AS scd_end_date

表示将下一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列

16、Lag()函数

lag(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id  ORDER BY TO_DATE(scd_start_date)) AS scd_end_date

表示将上一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列

17、Row_num()函数

row_number() OVER (PARTITION BY COL1 ORDER BY COL2) as order_num 表示根据COL1分组,在分组内部根据 COL2排序,而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)

18、Explain语句

explain select * from ods_alm_liuyue_test_0905

19、Compute语句

20、compute stats ods_alm_liuyue_test_0905

上一篇:oracleSQL 转 SPARKSQL(hiveSql) 及常用优化


下一篇:SparkSQL 核心编程