【说明】
本文主要是针对以下两个问题进行的分析整理:
1、如果SQL中存在left join,是否绝对无法改写?
2、在left join左边的表,并不一定就是驱动表,explain执行计划可能会将left join右边的表作为驱动表,什么情况下,会产生这种情况?
【left join的用途】
我们先来了解下left join的实际用途,即它会产生什么效果。
left join(左联接),会返回包括左表中的所有记录和右表中联结字段相等的记录。
这句话的含义是,如果a left join b,那么返回的结果集是“a表与b表join关联得到的结果集”+"a表中,无法与b关联的其它数据",其中,a与b无法关联的数据只会显示a表的内容,b表对应的内容都为空。所以a left join b得到的数据行数,是大于等于a表的数据行的。
示例:
a、b表的结构、数据如下:
mysql>select * from a
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 111 | 1 | 1 |
| 222 | 2 | 2 |
| 11 | 1 | 1 |
| 22 | 2 | 2 |
+--------------+--------------+--------------+
返回行数:[6],耗时:8 ms.
mysql>select * from b
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+--------------+--------------+--------------+
返回行数:[3],耗时:8 ms.
mysql>select a.*,b1.*
from a
join b b1 on a.id=b1.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
mysql>select a.*,b1.*
from a
left join b b1 on a.id=b1.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[7],耗时:8 ms.
可以看到a left join b的上半部分结果就是“a表与b表join关联得到的结果集”,下半部分的结果集是"a表中,无法与b关联的其它数据"
【疑问一的分析】
明白了left join的实际作用后,我们再来分析以下几种情况:
1、问:
A:select a.,c. from a left join (select * from b where b.t1=1) c on a.id=c.id
是否可以改写为
B:select a.,b. from a left join b on a.id=b.id where b.t1=1
答:否,这两种情况的意义完全不一样,所得到的是两种不同的结果。
在A情况下,a表是与子查询的结果集进行左表连接,虽然自查询通过where条件进行了过滤,但影响的是b表会有哪些数据会在结果集中显示,a表中的数据不会有任何缺失,最后的结果行数肯定还是大于等于a表的行数的
在B情况下,a表与b表进行左表连接后,得到一个结果集,最后再对整个结果集进行过滤,在对整个结果集过滤的时候,a表中的数据可能也会被过滤掉,过滤后的结果行数可能是大于、等于或小于a表的行数的
示例:
注:a、b两表的结构与数据在第一处代码中已有展示,这里不再赘述
mysql>select a.*,c.* from a left join (select * from b where b.t1=1) c on a.id=c.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:57 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[1],耗时:7 ms.
2、问:
A:select a.,b. from (select * from a where a.t1=1) left join b on a.id=b.id
是否可以改写为
B:select a.,b. from a left join b on a.id=b.id where a.t1=1
答:可以,效果是一样的
在A情况下,a表过滤后的结果集作为左连接的左表,最后的结果集肯定是只包含a.t1=1的
在B情况下,a与b左连接先得到一个结果集,然后对结果集中根据a表的t1列进行过滤,最后的结果集肯定也是只包含a.t1=1的
示例:
注:a、b两表的结构与数据在第一处代码中已有展示,这里不再赘述
mysql>select c.*,b.* from (select * from a where a.t1=1) c left join b on c.t1=b.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 111 | 1 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[3],耗时:10 ms.
mysql>select a.*,b.* from a left join b on a.t1=b.t1 where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 111 | 1 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[3],耗时:8 ms.
【小结】
上述两种情况是对left join中子查询存在where的情况进行了分析:
如果left join的右表即被驱动表是一个子查询,里面有where条件对被驱动表进行了过滤,这种情况不能直接将where条件放在最外边,因为子查询中通过where过滤后还是会与驱动表进行左连接,驱动表中的数据不会有任何缺失,但如果将where条件放在最外边,可能也会过滤掉驱动表的数据,两种SQL的意义是完全不一样的;
如果left join的左表即驱动表是一个子查询,里面有where条件对驱动表进行了过滤,那么可以将where条件放在最外边。
3、问:下面三种情况是等价的吗?
A:select a.,c. from a left join (select * from b where b.t1=1) c on a.id=c.id
B:select a.,b. from a left join b on a.id=b.id and b.t1=1
C:select a.,b. from a left join b on a.id=b.id and a.t1=1
答:A与B是完全等价的,与C是不等价的。
A的含义是,b表过滤后再被驱动表左连接,得到的结果集中,a表的内容会全部显示,b表只会显示与a表关联后符合b.t1=1的内容
B的含义是,a表与b表进行左连接关联,得到的结果集中,a表的内容会全部显示,但b表的内容需要同时满足a.id=b.id与b.t1=1才会显示,即b表只会显示与a表关联后且b.t1=1的内容
C的含义是,a表与b表进行左连接关联,得到的结果集中,a表的内容会全部显示,b表会显示的内容需要同时满足a.id=b.id且a表的a.t1=1
示例:
a表数据如下所示
mysql>select * from a
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 111 | 1 | 1 |
| 222 | 2 | 2 |
| 11 | 1 | 1 |
| 22 | 2 | 2 |
+--------------+--------------+--------------+
b表数据如下所示
mysql>select * from b
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+--------------+--------------+--------------+
mysql>select a.*,c.* from a left join (select * from b where b.t1=1) c on a.id=c.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | | | |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:9 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id and b.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | | | |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:7 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id and a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:7 ms.
【小结】
从这里可以看出 left join on 中and 的含义,and与where不同,and是与on组合使用的,表示当同时满足这些条件时,才会显示被驱动表的对应内容,在left join中and对驱动表要显示的内容并没有过滤性
【进一步思考】
那么在join中,and的作用又是怎么样的?
A:select a.,c. from a join (select * from b where b.t1=1) c on a.id=c.id
B:select a.,b. from a join b on a.id=b.id and b.t1=1
C:select a.,b. from a join b on a.id=b.id and a.t1=1
此时A与B还是等价的,与C不是等价的
A表示,b表通过b.t1=1过滤,然后与a表关联,得到一个结果集,结果集是符合a.id=c.id且b.t1=1的
B表示,a与b join关联后,再对结果集的b.t1=1进行过滤,最后的结果集也是符合a.id=c.id且b.t1=1的
C表示,a与b join关联后,再对结果集的b.t1=1进行过滤,最后的结果集是符合a.id=c.id且c.t1=1的
所以,在join情况下,and与where的作用是一样的,都是对结果集的进一步过滤
总结:
如果带有where的子查询是驱动表,可以将其改写,如果带有where的子查询是被驱动表,不能改写
【问题二的分析】
现象:
mysql>explain
SELECT b.*
FROM a
LEFT JOIN b ON b.uid = a.uid
WHERE b.w_id = 2327
+--------------+-----------------------+-----------------+----------------+-------------------------+---------------+-------------------+------------------+----------------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------------+-----------------------+-----------------+----------------+-------------------------+---------------+-------------------+------------------+----------------+--------------------------+
| 1 | SIMPLE | b | ref | w_id,uid | w_id | 4 | const | 402 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx_4 | PRIMARY | 3 | test.b.uid | 1 | Using where; Using index |
问:left join不是一定会以左表作为驱动表吗?为什么explain查看left join的执行计划时,有时右边的表会作为驱动表?如果left join右边的表足够小,会被转换为驱动表吗?
答:并不是右表足够小就会作为驱动表,而是当left join的效果与join是完全相同的时候,left join会被转换为join,此时驱动表与被驱动表会依据结果集的大小来判断,自然会有右边的表会作为驱动表的情况
问:什么情况下,left join的效果与join是完全相同的?
答:当最外层对右表进行where条件非空过滤时
即:
A:select a.,b. from a left join b on a.id=b.id where b.t1=2
B:select a.,b. from a join b on a.id=b.id where b.t1=2
A与B对效果是完全相同的,最后的结果都是a与满足b.t1=1的b表关联的结果集,A会被优化器转换为B
示例:
mysql>select a.*,b.* from a left join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:9 ms.
mysql>select a.*,b.* from a join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[2],耗时:8 ms.
结果集相同,可相互转换:
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1=2
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[2],耗时:8 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where b.t1=2
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[2],耗时:7 ms.
结果集不同,不可转换:
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1 is null
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[4],耗时:9 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where b.t1 is null
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[0],耗时:8 ms.
【进一步说明】
如果where条件是对左表的非空过滤,left join是否还等价于join?
A:select a.,b. from a left join b on a.id=b.id where a.t1=1
B:select a.,b. from a join b on a.id=b.id where a.t1=1
答:不等价。
A表示a与b左表关联后,关联后的结果集中肯定有a表的全部数据,然后再通过a.k1=1进行过滤
B表示a与b进行关联后,关联后的结果集中是a表的部分数据,然后再对这部分数据通过a.k1=1进行过滤
两者是不等价的,这种情况的left join一定会以a表作为驱动表
示例:
mysql>select a.*,b.* from a left join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[6],耗时:7 ms.
mysql>select a.*,b.* from a join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[2],耗时:8 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 111 | 1 | 1 | | | |
| 11 | 1 | 1 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[3],耗时:10 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
返回行数:[1],耗时:8 ms.
【小结】
简单点理解,如果left join最后带有where条件,这个where是针对右表的,通过where过滤后可能导致右表的数据小于左表,而此时left join与join也是等价的,这样就会以右表作为驱动表
如果left join的where条件是针对左表的,在left join中,左表本来就会作为驱动表,通过where过滤后左表的数据可能更小,更适合作为驱动表
【总结】
本文对left join对两种问题进行了分析与说明,如有不足之处欢迎指正。