死锁模拟测试TM外键无索引情况

1.1问题现象

DBA朋友遇到的问题,LINUX Redhat 6.9,2节点RAC,生产环境报ORA 死锁错误,检查发现存在TM锁,属于外键无索引问题,影响业务。 通过外键列创建索引解决问题。

本篇文档进行模拟测试,并在此熟悉外键无索引的情况。

1.2死锁报错模拟

SQL> create table dept1 as select * from dept;
SQL>  create table emp1 as select * from emp;

SQL> alter table dept1 add constraint d_pk primary key(DEPTNO);
SQL> alter table emp1 add constraint e_pk primary key(EMPNO);
SQL> alter table emp1 add constraint for_key foreign key(DEPTNO) references dept1(DEPTNO);

模拟
session1 执行删除emp1 表deptno 10的数据
session2 执行删除emp1 表deptno 20的数据
session1 执行删除dept1 表deptno 10的数据
session2 执行删除dept2 表deptno 10的数据

session1 执行删除emp1 表deptno 10的数据
SQL> delete emp1 where deptno=10;
3 rows deleted.


SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
       SID TY        ID1        ID2      LMODE    REQUEST
---------- -- ---------- ---------- ---------- ----------
       312 TM      90545          0          3          0
       312 TM      90546          0          3          0
       312 TX     589847       9382          6          0
       
SQL> select object_id,object_name from dba_objects where object_id in(90546,90545);
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     90545 DEPT1
     90546 EMP1

session2 执行删除emp1 表deptno 20的数据
SQL> delete emp1 where deptno=20;
5 rows deleted.

session1 执行删除dept1 表deptno 10的数据
SQL> delete dept1 where deptno=10;
hang
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');
       SID TY        ID1        ID2      LMODE    REQUEST
---------- -- ---------- ---------- ---------- ----------
       330 TM      90545          0          3          0
       330 TM      90546          0          3          0
       312 TM      90545          0          3          0
       312 TM      90546          0          3          5
       330 TX     131096       9439          6          0
       312 TX     589847       9382          6          0

6 rows selected.   会话1,在子表存在会话2持有的TM3号锁的情况下,会话1执行删除父表主键的记录,需要申请子表的TM5号锁,无法获取到,会话hang住了!



session2 执行删除dept2 表deptno 10的数据
SQL> delete dept1 where deptno=20;
会话hang 

并且会话一报错!
delete dept1 where deptno=10
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

反过来想,如果会话1,删除子表的记录,在删除父表的记录,会话2,删除父表的记录,在删除子表的记录无法操作,因为删除父表的记录就hang住了。
因此测试流程是符合业务逻辑的预期,及两个会话,一个在删除部门1,另一个在删除部门2,正常情况下不存在阻塞!  外键无索引才会产生这种独特的现象!

 

1.3观察死锁TRACE

 

Mon Oct 19 07:01:51 2020
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/tt/tt/trace/tt_ora_3910.trc.

*** 2020-10-19 07:01:51.036
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-000161b2-00000000        19     312    SX   SSX       21     330    SX   SSX
TM-000161b2-00000000        21     330    SX   SSX       19     312    SX   SSX

session 312: DID 0001-0013-0000000F     session 330: DID 0001-0015-0000000B
session 330: DID 0001-0015-0000000B     session 312: DID 0001-0013-0000000F

Rows waited on:
  Session 312: no row
  Session 330: no row

----- Information for the OTHER waiting sessions -----
Session 330:
  sid: 330 ser: 53 audsid: 734385 user: 83/SCOTT
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 21 O/S info: user: oracle, term: UNKNOWN, ospid: 3996
    image: oracle@test (TNS V1-V3)
  client details:
    O/S info: user: oracle, term: pts/2, ospid: 3989
    machine: test program: sqlplus@test (TNS V1-V3)
    application name: SQL*Plus, hash value=3669949024
  current SQL:
  delete dept1 where deptno=20
······

Current Wait Stack:
0: waiting for 'enq: TM - contention'
name|mode=0x544d0005, object #=0x161b2, table/partition=0x0
wait_id=371 seq_num=372 snap_id=1
wait times: snap=1 min 39 sec, exc=1 min 39 sec, total=1 min 39 sec
wait times: max=infinite, heur=1 min 39 sec
wait counts: calls=33 os=33
in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 330, ser: 53
Dumping final blocker:
inst: 1, sid: 330, ser: 53

 

可以发现,会话当前存在TM等待阻塞现象!!!

----------------------------------------
SO: 0xf0854ab8, type: 54, owner: 0xf0a85c70, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0xf59eff48, name=DML lock, file=kta.h LINE:2261, pg=0
DML LOCK: tab=90546 flg=11 chi=0
his[0]: mod=3 spn=13039
(enqueue) TM-000161B2-00000000 DID: 0001-0013-0000000F
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x6
mode: SX, req: SSX, lock_flag: 0x0, lock: 0xf0854b18, res: 0xf3a89480
own: 0xf33d4e60, sess: 0xf33d4e60, proc: 0xf59eff48, prv: 0xf3a894b0
----------------------------------------
SO: 0xf0854958, type: 54, owner: 0xf0a85c70, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0xf59eff48, name=DML lock, file=kta.h LINE:2261, pg=0
DML LOCK: tab=90545 flg=11 chi=0
his[0]: mod=3 spn=13039
(enqueue) TM-000161B1-00000000 DID: 0001-0013-0000000F
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x6
mode: SX, lock_flag: 0x0, lock: 0xf08549b8, res: 0xf3a88be8
own: 0xf33d4e60, sess: 0xf33d4e60, proc: 0xf59eff48, prv: 0xf3a88bf8
----------------------------------------


SQL> select to_number('161B1','xxxxxx') as "161B1",to_number('161B2','xxxxx') as "161B2" from dual;

161B1 161B2
---------- ----------
90545 90546

SQL> select object_id,object_name from dba_objects where object_id in(90546,90545);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
     90545 DEPT1
     90546 EMP1
这两者什么关系?  主外键关系。

 

1.4子表外键创建索引后,观察测试业务执行的session1,session2是否正常

会话一,rollback;
SQL> roll;
此时会话2,持有子表父表的TM多少级别的锁?

SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');


SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
330 TM 90545 0 3 0
330 TM 90546 0 3 0
330 TX 131096 9439 6 0

也就是说,子表没有TM3号锁的情况下,会话对子表删除记录,对父表删除记录,是正常的申请子表、父表的TM3号锁的!!!但是如果子表存在其它会话TM3号锁,父表的删除会话会申请子表的TM5号锁,从而阻塞后续正常的业务!

会话1尝试,再次删除父表的记录,hang

SQL> delete dept1 where deptno=10;  满足前面说的情况。

会话1尝试,再次删除子表的记录,deptno=10 正常情况下,与会话2 删除deptno 20不冲突!

SQL> delete emp1 where deptno=10;

3 rows deleted.  可以删除!    进一步理解,外键无索引在什么情况下会出现TM等待现象了吧。

 

===对子表创建索引,会话1 rollback;

SQL> create index e_ind on emp1(deptno);
create index e_ind on emp1(deptno)
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

?创建索引时,会申请什么锁呢,本次测试顺带验证下。

SQL> alter session set ddl_lock_timeout=600;

SQL> create index e_ind on emp1(deptno);

hang

SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');

SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
320 TM 90546 0 0 4
330 TM 90545 0 3 0
330 TM 90546 0 3 0
330 TX 131096 9439 6 0     

这个时候再次测试会话1,执行删除deptno 10能成功?

SQL> delete emp1 where deptno=10;  会话hang

索引不加Online会对表申请TM4号锁,因此创建索引老DBA都建议加Online。

测试环境会话1,2 rollback;

 

创建索引成功后。再次测试!最初流程。

 

模拟
session1 执行删除emp1 表deptno 10的数据
session2 执行删除emp1 表deptno 20的数据
session1 执行删除dept1 表deptno 10的数据
session2 执行删除dept2 表deptno 10的数据

session1 执行删除emp1 表deptno 10的数据
SQL> delete emp1 where deptno=10;
3 rows deleted.


SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');

SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
312 TM 90545 0 3 0
312 TM 90546 0 3 0
312 TX 655387 8935 6 0

       
SQL> select object_id,object_name from dba_objects where object_id in(90546,90545);
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     90545 DEPT1
     90546 EMP1

session2 执行删除emp1 表deptno 20的数据
SQL> delete emp1 where deptno=20;
5 rows deleted.

session1 执行删除dept1 表deptno 10的数据
SQL> delete dept1 where deptno=10;
1 row deleted.

SQL> select sid,type,id1,id2,lmode,request from gv$lock where type in('TX','TM');

SID TY ID1 ID2 LMODE REQUEST
---------- -- ---------- ---------- ---------- ----------
330 TM 90545 0 3 0
330 TM 90546 0 3 0
312 TM 90545 0 3 0
312 TM 90546 0 3 0
330 TX 524312 9604 6 0
312 TX 655387 8935 6 0

6 rows selected.

 会话1,在子表存在会话2持有的TM3号锁的情况下,会话1执行删除父表主键的记录,需要申请子表的TM3号锁!  只要数据行不冲突,TM3号锁可以兼容,一个对象同时可以允许多个TM3。



session2 执行删除dept2 表deptno 10的数据
SQL> set autotrace on SQL> delete dept1 where deptno=20;

1 row deleted.

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 13 | 0 (0)| 00:00:01 |
| 1 | DELETE | DEPT1 | | | | |
|* 2 | INDEX UNIQUE SCAN| D_PK | 1 | 13 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("DEPTNO"=20)

这里面看不到任何关于子表外键索引的信息,但是实质上Oracle内部还是通过子表的外键列索引,快速定位到了子表的行记录。

 

 

上一篇:19_MySQL表的内连接


下一篇:oracle中的函数