测试not in,not exists ,left join 查询两表中不同数据的效率

测试not in,not exists ,left join 查询两表中不同数据的效率

经常要比较找出两个表中不同的数据,前提当然是两个表的结构一致。
三个方法,做一个详细的验证测试,看看哪一个效果最佳 。

1、测试数据准备:

测试环境,准备两个表A和B,结构一样 ,A的数据比B多一点。
为了能够测试出时间差别,表的数据量要大一点,准备10万的数据。

CREATE TABLE "TA" 
   (	"ID" VARCHAR2(8), 
	"NAME" VARCHAR2(8), 
	"AGE" NUMBER(2,0)
   );
CREATE TABLE "TB" 
   (	"ID" VARCHAR2(8), 
	"NAME" VARCHAR2(8), 
	"AGE" NUMBER(2,0)
   );   

写一个脚本做测试,准备10万数据量:


declare
  -- Local variables here
  i     integer;
  t_sql varchar2(1000);

begin
  i := 0;
  -- Test statements here
  -- 清理测试环境
  t_sql := 'truncate table ta';
  execute immediate t_sql;

  t_sql := 'truncate table tb';
  execute immediate t_sql;

  t_sql := 'drop index idx_ta_id';
  execute immediate t_sql;

  t_sql := 'drop index idx_tb_id';
  execute immediate t_sql;
 --10万数据量
  while i < 100000 loop
    insert into ta
    values
      (dbms_random.string('A', 8), dbms_random.string( 'U', 8), dbms_random.value
      (20, 50));
    i := i + 1;
  end loop;

  commit;

  t_sql := 'insert into tb select * from ta';
  execute immediate t_sql;
  execute immediate 'commit';
  --4条不同数据在A表
  i := 0;
  while i < 4 loop
    insert into ta
    values
      (dbms_random.string('A', 8), dbms_random.string( 'U', 8), dbms_random.value
      (20, 50));
    i := i + 1;
  end loop;

  commit;

end;

2、无索引测试效果

SQL> select * from ta left join tb on ta.id=tb.id  where tb.name is null;

ID       NAME     AGE ID       NAME     AGE
-------- -------- --- -------- -------- ---
BrNJOLld CPJNZIBR  42                   
spZVkdVs PEFNZMGN  34                   
yFyReiDC CEBVYXGJ  27                   
NFRXeOvm WPQKQQZI  41                   

Executed in 0.085 seconds


SQL> select * from ta where not  exists (select 1 from tb where ta.id=tb.id) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.066 seconds


SQL> select * from ta where not  exists (select tb.id from tb where ta.id=tb.id) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.09 seconds


SQL> select * from ta where ta.id not in (select tb.id from tb) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.097 seconds

测试结果:
left join :0.085
not exists :0.066 0.090
not in:0.097
说明:
not exist
通过1伪列,效率更高。
select 1 from 中的1是一常量(可以为任意数值),查到的所有行的值都是它,但从效率上来说,1 > 列名 > * 所有记录,因为不用访问字典表,效率最高。

select * from ta where not exists (select 1 from tb where ta.id=tb.id) ;
用tb.id ,更符合习惯和理解。
select * from ta where not exists (select tb.id from tb where ta.id=tb.id) ;

结论是not exists (1)> left join>not exist(id) >not in

3、有索引测试效果

再看看有索引的效果:

SQL> create index idx_ta_id on ta(id);

Index created


Executed in 0.134 seconds


SQL> create index idx_tb_id on tb(id);

Index created


Executed in 0.139 seconds

再执行四个查询语句:

SQL> select * from ta left join tb on ta.id=tb.id  where tb.name is null;

ID       NAME     AGE ID       NAME     AGE
-------- -------- --- -------- -------- ---
BrNJOLld CPJNZIBR  42                   
spZVkdVs PEFNZMGN  34                   
yFyReiDC CEBVYXGJ  27                   
NFRXeOvm WPQKQQZI  41                   

Executed in 0.116 seconds


SQL> select * from ta where not  exists (select 1 from tb where ta.id=tb.id) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.127 seconds


SQL> select * from ta where not  exists (select tb.id from tb where ta.id=tb.id) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.142 seconds


SQL> select * from ta where ta.id not in (select tb.id from tb) ;

ID       NAME     AGE
-------- -------- ---
spZVkdVs PEFNZMGN  34
NFRXeOvm WPQKQQZI  41
BrNJOLld CPJNZIBR  42
yFyReiDC CEBVYXGJ  27

Executed in 0.148 seconds

测试结果:
left join: 0.116
not exists: 0.127 0.142
not in : 0.148

结论是left join>not exists (1)>not exist(id) >not in

4、结论

在10万的数据量比对效果:
(1)无索引:
not exists (1)> left join>not exist(id) >not in
(2)有索引:
left join>not exists (1)>not exist(id) >not in

在生产环境中,一般都是通过索引字段进行比对,因此选择left join 速度更好。

上一篇:内卷效益带来的大数据可视化


下一篇:努力一下,还是可以成为技术美术(TA)的