Oracle分布式事务异常处理笔记

1. Identify the id OF the TRANSACTION:

COLUMN global_tran_id format a25
COLUMN DATABASE format a22
COLUMN global_name format a22
SELECT * FROM global_name;
SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID,to_char(FAIL_TIME,'dd-mon-yyyy HH24:MI:SS'),STATE, MIXED FROM DBA_2PC_PENDING;          
SELECT LOCAL_TRAN_ID, IN_OUT,INTERFACE, DATABASE FROM DBA_2PC_NEIGHBORS;

2. Purge the TRANSACTION:

EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('');
COMMIT;

3. Confirm that the TRANSACTION has been purged:

SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID,to_char(FAIL_TIME,'dd-mon-yyyy HH24:MI:SS'),STATE, MIXED FROM DBA_2PC_PENDING;
SELECT LOCAL_TRAN_ID, IN_OUT,INTERFACE, DATABASE FROM DBA_2PC_NEIGHBORS;

其中有如下五种state:

collecting
   -- execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');
prepared
   -- rollback force tran_id/commit force tran_id;
   EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');
committed
   -- execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');
forced commit
   -- execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');
forced ROLLBACK
   -- execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');


++++++ 如果遇到ORA-30019错误,可以采取如下方式:++++++

ALTER SESSION SET "_smu_debug_mode" = 4;
EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.10.255');

====== 情况1 在dba_2pc_pending表中还有事务记录,但是实际已经不存在该事务了

SELECT LOCAL_TRAN_ID,
       GLOBAL_TRAN_ID,
       to_char(FAIL_TIME, 'dd-mon-yyyy HH24:MI:SS'),
       STATE,
       MIXED
FROM DBA_2PC_PENDING;

LOCAL_TRAN_ID         1.92.66874             prepared

1 为回滚段号

SELECT KTUXEUSN,
       KTUXESLT,
       KTUXESQN, /* Transaction ID */
       KTUXESTA STATUS,
       KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta != 'INACTIVE'
AND ktuxeusn = 1

返回为0

如果当状态为prepared,且事务表中也不存在相关信息,那么我们只能手工进行清理:

++++++ 使用如下方式进行手工处理:++++++

SET TRANSACTION USE ROLLBACK segment SYSTEM;
DELETE FROM sys.pending_trans$ WHERE local_tran_id = '1.92.66874';
DELETE FROM sys.pending_sessions$ WHERE local_tran_id = '1.92.66874';
DELETE FROM sys.pending_sub_sessions$ WHERE local_tran_id = '1.92.66874';
commit;

====== 情况2  在dba_2pc_pending表中无法查到分布式事务信息,但是实际上却是存在该分布式事务的

SELECT LOCAL_TRAN_ID,
       GLOBAL_TRAN_ID,
       to_char(FAIL_TIME, 'dd-mon-yyyy HH24:MI:SS'),
       STATE,
       MIXED
FROM DBA_2PC_PENDING;

查询无记录

SELECT local_tran_id, state
FROM dba_2pc_pending
WHERE local_tran_id = ' 1.92.66874 ';  -- 为空

SELECT KTUXEUSN,
       KTUXESLT,
       KTUXESQN, /* Transaction ID */
       KTUXESTA STATUS,
       KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta != 'INACTIVE'
AND ktuxeusn = 1;

查询有记录

====== 此种情况下,我们无法手工进行ROLLBACK或commit ======

++++++ 我们用如下的方式手工清理:++++++

ALTER system disable distributed recovery ;

INSERT INTO pending_trans $
(LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE #,
FAIL_TIME,
RECO_TIME)
VALUES
(' 1.92.66874 ',
306206,
' XXXXXXX.12345.1.2.3 ',
' prepared ',
' P ',
hextoraw(' 00000001 '),
hextoraw(' 00000000 '),
0,
sysdate,
sysdate);

INSERT INTO pending_sessions $
VALUES
(' 1.92.66874 ',
1,
hextoraw(' 05004F003A1500000104 '),
' C ',
0,
30258592,
'',
146);

commit ;

commit   force ' 1.92.66874 ' ;

++++++ 此时如果commit force还是出现报错,需要继续执行:++++++

1. DELETE FROM pending_trans $ WHERE local_tran_id = '1.92.66874' ;
2. DELETE FROM pending_sessions $ WHERE local_tran_id = '1.92.66874' ;
3. commit ;
4. ALTER system enable distributed recovery ;    
5. ALTER SESSION SET " _smu_debug_mode " = 4 ;
6. EXEC dbms_transaction.purge_lost_db_entry ( '1.92.66874' )

====== 另外我们还可以通过如下SQL来捕获到导致分布式事务失败的SQL:======

++++++ 获取local_tran_id ++++++

SELECT a.sql_text, s.osuser, s.username
FROM v$transaction t, v$session s, v$sqlarea a
WHERE s.taddr = t.addr
AND a.address = s.prev_sql_addr
AND t.xidusn = 1
AND t.xidslot = 25
AND t.xidsqn = 589367;

如果 v$session 和 v$sqlarea 已经无法查到,那么我们还可以关联一些 dba_hist_* 试图进行查询。

上一篇:Oracle访问SQLServer透明网关配置笔记


下一篇:EasyHook Creating a remote file monitor