Mysql死锁场景分析

报错日志

org.springframework.dao.DeadlockLoserDataAccessException: 
### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may exist in com/atome/ops/mapper/CampaignJobInstanceMapper.java (best guess)
### The error may involve com.atome.ops.mapper.CampaignJobInstanceMapper.update-Inline
### The error occurred while setting parameters
### SQL: update `campaign_job_instance`      SET `total_count`=?,     `status`=?,     `fetch_start_time`=?,     `fetch_end_time`=?,     `send_start_time`=?,     `send_end_time`=?,     `update_timestamp`=?          where id=?
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction

主键更新死锁,why?

死锁日志明细

## 执行以下语句查看最近一次死锁日志
show engine innodb status;

...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-02-07 21:00:00 0x7f3972be1700
*** (1) TRANSACTION:
TRANSACTION 401368548, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 3
MySQL thread id 24621163, OS thread handle 139885409634048, query id 1074586194 172.21.32.245 root Updating
update `campaign_job_instance`
     SET `total_count`=6,
    `status`='SENDING',
    `fetch_start_time`=1644238560063,
    `fetch_end_time`=1644238560115,
    `send_start_time`=1644238800550,
    `send_end_time`=null,
    `update_timestamp`=1644238800550 
        where id=5193
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368548 lock_mode X locks rec but not gap waiting
Record lock, heap no 35 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
 0: len 4; hex 00001449; asc    I;;
 1: len 6; hex 000017ec6739; asc     g9;;
 2: len 7; hex 390002800f1934; asc 9     4;;
 3: len 28; hex 6a6f622d363166653865333464636531613630303031346636333166; asc job-61fe8e34dce1a600014f631f;;
 4: len 24; hex 363230313136653033333863656530303031326637346337; asc 620116e0338cee00012f74c7;;
 5: len 30; hex 3631666538653334646365316136303030313466363331642d3230323230; asc 61fe8e34dce1a600014f631d-20220; (total 37 bytes);
 6: len 3; hex 534d53; asc SMS;;
 7: len 24; hex 363166653865333464636531613630303031346636333164; asc 61fe8e34dce1a600014f631d;;
 8: len 30; hex 66353464623661372d336263662d346361312d613630302d643337616234; asc f54db6a7-3bcf-4ca1-a600-d37ab4; (total 36 bytes);
 9: len 4; hex 53454e54; asc SENT;;
 10: len 4; hex 80000006; asc     ;;
 11: len 8; hex 8000017ed4415b3f; asc    ~ A[?;;
 12: len 8; hex 8000017ed4415b73; asc    ~ A[s;;
 13: len 8; hex 8000017ed44505ab; asc    ~ E  ;;
 14: len 8; hex 8000017ed44506a1; asc    ~ E  ;;
 15: len 8; hex 8000017ed4415b3f; asc    ~ A[?;;
 16: len 8; hex 8000017ed44506a1; asc    ~ E  ;;

*** (2) TRANSACTION:
TRANSACTION 401368889, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 24621104, OS thread handle 139884714923776, query id 1074586215 172.21.32.245 root Updating
update `campaign_job_instance`
     SET `total_count`=6,
    `status`='SENT',
    `fetch_start_time`=1644238560026,
    `fetch_end_time`=1644238560120,
    `send_start_time`=1644238800280,
    `send_end_time`=1644238800554,
    `update_timestamp`=1644238800554 
        where id=5192
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368889 lock_mode X locks rec but not gap
Record lock, heap no 35 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
 0: len 4; hex 00001449; asc    I;;
 1: len 6; hex 000017ec6739; asc     g9;;
 2: len 7; hex 390002800f1934; asc 9     4;;
 3: len 28; hex 6a6f622d363166653865333464636531613630303031346636333166; asc job-61fe8e34dce1a600014f631f;;
 4: len 24; hex 363230313136653033333863656530303031326637346337; asc 620116e0338cee00012f74c7;;
 5: len 30; hex 3631666538653334646365316136303030313466363331642d3230323230; asc 61fe8e34dce1a600014f631d-20220; (total 37 bytes);
 6: len 3; hex 534d53; asc SMS;;
 7: len 24; hex 363166653865333464636531613630303031346636333164; asc 61fe8e34dce1a600014f631d;;
 8: len 30; hex 66353464623661372d336263662d346361312d613630302d643337616234; asc f54db6a7-3bcf-4ca1-a600-d37ab4; (total 36 bytes);
 9: len 4; hex 53454e54; asc SENT;;
 10: len 4; hex 80000006; asc     ;;
 11: len 8; hex 8000017ed4415b3f; asc    ~ A[?;;
 12: len 8; hex 8000017ed4415b73; asc    ~ A[s;;
 13: len 8; hex 8000017ed44505ab; asc    ~ E  ;;
 14: len 8; hex 8000017ed44506a1; asc    ~ E  ;;
 15: len 8; hex 8000017ed4415b3f; asc    ~ A[?;;
 16: len 8; hex 8000017ed44506a1; asc    ~ E  ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368889 lock_mode X locks rec but not gap waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
 0: len 4; hex 00001448; asc    H;;
 1: len 6; hex 000017ec65e4; asc     e ;;
 2: len 7; hex 0e000002a31da0; asc        ;;
 3: len 28; hex 6a6f622d363230303832346464636531613630303031346636333261; asc job-6200824ddce1a600014f632a;;
 4: len 24; hex 363230313136653033333863656530303031326637346334; asc 620116e0338cee00012f74c4;;
 5: len 30; hex 3632303038323464646365316136303030313466363332382d3230323230; asc 6200824ddce1a600014f6328-20220; (total 37 bytes);
 6: len 3; hex 534d53; asc SMS;;
 7: len 24; hex 363230303832346464636531613630303031346636333238; asc 6200824ddce1a600014f6328;;
 8: len 30; hex 33333839633535662d663262612d346630392d393432632d353839633865; asc 3389c55f-f2ba-4f09-942c-589c8e; (total 36 bytes);
 9: len 7; hex 53454e44494e47; asc SENDING;;
 10: len 4; hex 80000006; asc     ;;
 11: len 8; hex 8000017ed4415b1a; asc    ~ A[ ;;
 12: len 8; hex 8000017ed4415b78; asc    ~ A[x;;
 13: len 8; hex 8000017ed4450699; asc    ~ E  ;;
 14: SQL NULL;
 15: len 8; hex 8000017ed4415b1a; asc    ~ A[ ;;
 16: len 8; hex 8000017ed4450699; asc    ~ E  ;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------

mysql之show engine innodb status解读

两条不同主键更新的语句相互等待,why?

从上述日志可以看出以下几点:

  • 事务1锁定4行数据,一行数据等待主键行锁(因超时从而回滚),其他数据行未显示明细(肯定有事务2中等待锁的数据行)
  • 事务2锁定2行数据,一行已经获取锁(事务1中等待锁的记录),一行等待获取锁

一个误区:锁定的数据行并不一定是由锁冲突的sql语句引起的,应该关注同一事务中是否有其它语句加锁

代码分析

class SaveUserJob implements SimpleJob {

    @Autowired 
    @Transactional
    CampaignJobInstanceService campaignJobInstanceService

    @Override
    void execute(ShardingContext shardingContext) {
        log.info("save user:${JsonUtil.toString(shardingContext)}")
        List<CampaignJobInstance> jobInstances = campaignJobInstanceService.findByStatus(CampaignJobInstanceStatusEnum.SAVING)
        jobInstances.each { jobInstance ->
           ...
           updateById(jobInstance)
           ...
        }
    }

}

class SendJob implements DataflowJob<CampaignUserLog> {

    ...
    @Override
    @Transactional
    void processData(ShardingContext shardingContext, List<CampaignUserLog> logs) {
        connectorExecuteService.execute(logs.findAll { it.status == CampaignUserJobStatusEnum.SENDING })
        logs.groupBy { it.jobInstanceId }.each {
            CampaignJobInstance jobInstance = findByInstanceId(it.key)
            ...
            updateById(jobInstance)
        }
    }
    ...
}

SaveUserJob、SendJob中均涉及到遍历主键更新campaign_job_instance,且均开启事务,两个事务中更新顺序的不通便会造成相互持有对方需要的锁,从而死锁

为什么之前自测时没有触发?
圈选的用户量过少,只有1个,save user的瞬间还没来得及更新状态就发送完成了
把SendJob的频次由2分钟改为了5分钟,会累积多个jobInstance到一个事务中

解决方案

减小锁粒度

class SaveUserJob implements SimpleJob {

    @Autowired
    CampaignJobInstanceService campaignJobInstanceService

    @Override
    // 外层仅查询,去除事务
    void execute(ShardingContext shardingContext) {
        log.info("save user:${JsonUtil.toString(shardingContext)}")
        List<CampaignJobInstance> jobInstances = campaignJobInstanceService.findByStatus(CampaignJobInstanceStatusEnum.SAVING)
        jobInstances.each { jobInstance ->
            // 抽取到service层,加上事务
            campaignJobInstanceService.saveUser(jobInstance)
        }
    }

}

class SendJob implements DataflowJob<CampaignUserLog> {

    ...
    @Override
    // 外层仅查询,去除事务
    void processData(ShardingContext shardingContext, List<CampaignUserLog> logs) {
        ...
        logs.groupBy { it.jobInstanceId }.each {
            ...
            // 抽取到service层,加上事务
            CampaignJobInstance jobInstance = findByInstanceId(it.key)
            campaignJobInstanceService.checkJobInstance(jobInstance)
        }
    }
    ...
}
上一篇:微信小程序商品展示页面(仿咸鱼)


下一篇:MySQL死锁问题实例分析及解决方法