测试not in,not exists ,left join 查询两表中不同数据的效率
经常要比较找出两个表中不同的数据,前提当然是两个表的结构一致。
三个方法,做一个详细的验证测试,看看哪一个效果最佳 。
1、测试数据准备:
测试环境,准备两个表A和B,结构一样 ,A的数据比B多一点。
为了能够测试出时间差别,表的数据量要大一点,准备10万的数据。
CREATE TABLE "TA"
( "ID" VARCHAR2(8),
"NAME" VARCHAR2(8),
"AGE" NUMBER(2,0)
);
CREATE TABLE "TB"
( "ID" VARCHAR2(8),
"NAME" VARCHAR2(8),
"AGE" NUMBER(2,0)
);
写一个脚本做测试,准备10万数据量:
declare
-- Local variables here
i integer;
t_sql varchar2(1000);
begin
i := 0;
-- Test statements here
-- 清理测试环境
t_sql := 'truncate table ta';
execute immediate t_sql;
t_sql := 'truncate table tb';
execute immediate t_sql;
t_sql := 'drop index idx_ta_id';
execute immediate t_sql;
t_sql := 'drop index idx_tb_id';
execute immediate t_sql;
--10万数据量
while i < 100000 loop
insert into ta
values
(dbms_random.string('A', 8), dbms_random.string( 'U', 8), dbms_random.value
(20, 50));
i := i + 1;
end loop;
commit;
t_sql := 'insert into tb select * from ta';
execute immediate t_sql;
execute immediate 'commit';
--4条不同数据在A表
i := 0;
while i < 4 loop
insert into ta
values
(dbms_random.string('A', 8), dbms_random.string( 'U', 8), dbms_random.value
(20, 50));
i := i + 1;
end loop;
commit;
end;
2、无索引测试效果
SQL> select * from ta left join tb on ta.id=tb.id where tb.name is null;
ID NAME AGE ID NAME AGE
-------- -------- --- -------- -------- ---
BrNJOLld CPJNZIBR 42
spZVkdVs PEFNZMGN 34
yFyReiDC CEBVYXGJ 27
NFRXeOvm WPQKQQZI 41
Executed in 0.085 seconds
SQL> select * from ta where not exists (select 1 from tb where ta.id=tb.id) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.066 seconds
SQL> select * from ta where not exists (select tb.id from tb where ta.id=tb.id) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.09 seconds
SQL> select * from ta where ta.id not in (select tb.id from tb) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.097 seconds
测试结果:
left join :0.085
not exists :0.066 0.090
not in:0.097
说明:
not exist
通过1伪列,效率更高。
select 1 from 中的1是一常量(可以为任意数值),查到的所有行的值都是它,但从效率上来说,1 > 列名 > * 所有记录,因为不用访问字典表,效率最高。
select * from ta where not exists (select 1 from tb where ta.id=tb.id) ;
用tb.id ,更符合习惯和理解。
select * from ta where not exists (select tb.id from tb where ta.id=tb.id) ;
结论是not exists (1)> left join>not exist(id) >not in
3、有索引测试效果
再看看有索引的效果:
SQL> create index idx_ta_id on ta(id);
Index created
Executed in 0.134 seconds
SQL> create index idx_tb_id on tb(id);
Index created
Executed in 0.139 seconds
再执行四个查询语句:
SQL> select * from ta left join tb on ta.id=tb.id where tb.name is null;
ID NAME AGE ID NAME AGE
-------- -------- --- -------- -------- ---
BrNJOLld CPJNZIBR 42
spZVkdVs PEFNZMGN 34
yFyReiDC CEBVYXGJ 27
NFRXeOvm WPQKQQZI 41
Executed in 0.116 seconds
SQL> select * from ta where not exists (select 1 from tb where ta.id=tb.id) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.127 seconds
SQL> select * from ta where not exists (select tb.id from tb where ta.id=tb.id) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.142 seconds
SQL> select * from ta where ta.id not in (select tb.id from tb) ;
ID NAME AGE
-------- -------- ---
spZVkdVs PEFNZMGN 34
NFRXeOvm WPQKQQZI 41
BrNJOLld CPJNZIBR 42
yFyReiDC CEBVYXGJ 27
Executed in 0.148 seconds
测试结果:
left join: 0.116
not exists: 0.127 0.142
not in : 0.148
结论是left join>not exists (1)>not exist(id) >not in
4、结论
在10万的数据量比对效果:
(1)无索引:
not exists (1)> left join>not exist(id) >not in
(2)有索引:
left join>not exists (1)>not exist(id) >not in
在生产环境中,一般都是通过索引字段进行比对,因此选择left join 速度更好。