HIVE-----count(distinct ) over() 无法使用解决办法

HIVE-----count(distinct ) over() 无法使用解决办法

在使用hive时发现count(distinct ) over()  报错

hive> with da as (
    > select 1 a, 'a' b union all
    > select 1 a, 'a' b union all
    > select 2 a, 'a' b union all
    > select 2 a, 'a' b union all
    > select 2 a, 'a' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b union all
    > select 3 a, 'b' b
    > )
    > select
    > a
    > ,b
    > ,sum(a) over(partition by b) 
    > , count(distinct a) over(partition by b) 
    > from da;
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 18:26 Expression not in GROUP BY key 'b'

   经过测试将

with da as (
select 1 a, 'a' b union all
select 1 a, 'a' b union all
select 2 a, 'a' b union all
select 2 a, 'a' b union all
select 2 a, 'a' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b
)
select

 count(distinct a) over(partition by b) 
from da

  当且仅当至于count(distinct ) over()一个时段时能够使用,原因可能时内部实现distinct出错 不知道是否和版本有关 使用版本为Hive version 1.1.0

HIVE-----count(distinct ) over() 无法使用解决办法

解决办法:如下使用collect_set(a) over(partition by b)函数将合并成一个分好组的集合 然后求出集合的值个数

因为collect_set()不能放入重复函数所以使用size()求集合元素数量时能达到count(distinct )的效果

with da as (
select 1 a, 'a' b union all
select 1 a, 'a' b union all
select 2 a, 'a' b union all
select 2 a, 'a' b union all
select 2 a, 'a' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b union all
select 3 a, 'b' b
)
select
a
,b
,sum(a) over(partition by b) 
,size(collect_set(a) over(partition by b))
from da

  结果HIVE-----count(distinct ) over() 无法使用解决办法

上一篇:12c OCP考试专项 [1z0-071]-Q18: IS NOT NULL 查询空值及 DISTINCT 去重用法(2020.06.18)


下一篇:Mysql略复杂命令总结