Hive in、exists 和 left semi join

with query1 as (
    select stack(4, 'A', 1, 'B', 2, 'C', 3, 'D', 4) as (k,v)
),
query2 as (
    select stack(4, 'A', 5, 'B', 6, 'E', 7, 'F', 8) as (k,v)
)

数据:
A:
Hive  in、exists 和 left semi join
B:
Hive  in、exists 和 left semi join

从保留A中k在query2中出现的数据

1.常规写法

select 
    t1.k as k,
    t2.v as v
from 
(
    select   k,v
    from A
)   t1

join
(
    select k,v
    from B
)   t2
on
    t1.k = t2.k

inner join之后选取左表的列,性能低

2. 使用 in

select k,v
from A 
where k in (select k from B)

in 的子查询中,只能select 一列

3.使用exists

select k,v
from A 
where exists (select k from B where A.k = B.k)
  1. exists子查询中,必须包含一个或多个关联语句(=)
  2. exists子查询中,select 的结果并不重要,主要是where的判断起作用)

4.使用left semi join

select
  k,v
from 
(
select k,v
from A 
) t1
left semi join 
(
select k,v
from B
) t2
on t1.k = t2.k

1.该语法相当于先inner join,再取原左表的列
2.没有right semi join ,只有left semi join

上一篇:一眼看懂 php 数组函数 array_key_exists


下一篇:openfalcon 使用记录