报错日志
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may exist in com/atome/ops/mapper/CampaignJobInstanceMapper.java (best guess)
### The error may involve com.atome.ops.mapper.CampaignJobInstanceMapper.update-Inline
### The error occurred while setting parameters
### SQL: update `campaign_job_instance` SET `total_count`=?, `status`=?, `fetch_start_time`=?, `fetch_end_time`=?, `send_start_time`=?, `send_end_time`=?, `update_timestamp`=? where id=?
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
主键更新死锁,why?
死锁日志明细
## 执行以下语句查看最近一次死锁日志
show engine innodb status;
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-02-07 21:00:00 0x7f3972be1700
*** (1) TRANSACTION:
TRANSACTION 401368548, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 3
MySQL thread id 24621163, OS thread handle 139885409634048, query id 1074586194 172.21.32.245 root Updating
update `campaign_job_instance`
SET `total_count`=6,
`status`='SENDING',
`fetch_start_time`=1644238560063,
`fetch_end_time`=1644238560115,
`send_start_time`=1644238800550,
`send_end_time`=null,
`update_timestamp`=1644238800550
where id=5193
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368548 lock_mode X locks rec but not gap waiting
Record lock, heap no 35 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
0: len 4; hex 00001449; asc I;;
1: len 6; hex 000017ec6739; asc g9;;
2: len 7; hex 390002800f1934; asc 9 4;;
3: len 28; hex 6a6f622d363166653865333464636531613630303031346636333166; asc job-61fe8e34dce1a600014f631f;;
4: len 24; hex 363230313136653033333863656530303031326637346337; asc 620116e0338cee00012f74c7;;
5: len 30; hex 3631666538653334646365316136303030313466363331642d3230323230; asc 61fe8e34dce1a600014f631d-20220; (total 37 bytes);
6: len 3; hex 534d53; asc SMS;;
7: len 24; hex 363166653865333464636531613630303031346636333164; asc 61fe8e34dce1a600014f631d;;
8: len 30; hex 66353464623661372d336263662d346361312d613630302d643337616234; asc f54db6a7-3bcf-4ca1-a600-d37ab4; (total 36 bytes);
9: len 4; hex 53454e54; asc SENT;;
10: len 4; hex 80000006; asc ;;
11: len 8; hex 8000017ed4415b3f; asc ~ A[?;;
12: len 8; hex 8000017ed4415b73; asc ~ A[s;;
13: len 8; hex 8000017ed44505ab; asc ~ E ;;
14: len 8; hex 8000017ed44506a1; asc ~ E ;;
15: len 8; hex 8000017ed4415b3f; asc ~ A[?;;
16: len 8; hex 8000017ed44506a1; asc ~ E ;;
*** (2) TRANSACTION:
TRANSACTION 401368889, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 24621104, OS thread handle 139884714923776, query id 1074586215 172.21.32.245 root Updating
update `campaign_job_instance`
SET `total_count`=6,
`status`='SENT',
`fetch_start_time`=1644238560026,
`fetch_end_time`=1644238560120,
`send_start_time`=1644238800280,
`send_end_time`=1644238800554,
`update_timestamp`=1644238800554
where id=5192
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368889 lock_mode X locks rec but not gap
Record lock, heap no 35 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
0: len 4; hex 00001449; asc I;;
1: len 6; hex 000017ec6739; asc g9;;
2: len 7; hex 390002800f1934; asc 9 4;;
3: len 28; hex 6a6f622d363166653865333464636531613630303031346636333166; asc job-61fe8e34dce1a600014f631f;;
4: len 24; hex 363230313136653033333863656530303031326637346337; asc 620116e0338cee00012f74c7;;
5: len 30; hex 3631666538653334646365316136303030313466363331642d3230323230; asc 61fe8e34dce1a600014f631d-20220; (total 37 bytes);
6: len 3; hex 534d53; asc SMS;;
7: len 24; hex 363166653865333464636531613630303031346636333164; asc 61fe8e34dce1a600014f631d;;
8: len 30; hex 66353464623661372d336263662d346361312d613630302d643337616234; asc f54db6a7-3bcf-4ca1-a600-d37ab4; (total 36 bytes);
9: len 4; hex 53454e54; asc SENT;;
10: len 4; hex 80000006; asc ;;
11: len 8; hex 8000017ed4415b3f; asc ~ A[?;;
12: len 8; hex 8000017ed4415b73; asc ~ A[s;;
13: len 8; hex 8000017ed44505ab; asc ~ E ;;
14: len 8; hex 8000017ed44506a1; asc ~ E ;;
15: len 8; hex 8000017ed4415b3f; asc ~ A[?;;
16: len 8; hex 8000017ed44506a1; asc ~ E ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10003 page no 263 n bits 104 index PRIMARY of table `user_label`.`campaign_job_instance` trx id 401368889 lock_mode X locks rec but not gap waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
0: len 4; hex 00001448; asc H;;
1: len 6; hex 000017ec65e4; asc e ;;
2: len 7; hex 0e000002a31da0; asc ;;
3: len 28; hex 6a6f622d363230303832346464636531613630303031346636333261; asc job-6200824ddce1a600014f632a;;
4: len 24; hex 363230313136653033333863656530303031326637346334; asc 620116e0338cee00012f74c4;;
5: len 30; hex 3632303038323464646365316136303030313466363332382d3230323230; asc 6200824ddce1a600014f6328-20220; (total 37 bytes);
6: len 3; hex 534d53; asc SMS;;
7: len 24; hex 363230303832346464636531613630303031346636333238; asc 6200824ddce1a600014f6328;;
8: len 30; hex 33333839633535662d663262612d346630392d393432632d353839633865; asc 3389c55f-f2ba-4f09-942c-589c8e; (total 36 bytes);
9: len 7; hex 53454e44494e47; asc SENDING;;
10: len 4; hex 80000006; asc ;;
11: len 8; hex 8000017ed4415b1a; asc ~ A[ ;;
12: len 8; hex 8000017ed4415b78; asc ~ A[x;;
13: len 8; hex 8000017ed4450699; asc ~ E ;;
14: SQL NULL;
15: len 8; hex 8000017ed4415b1a; asc ~ A[ ;;
16: len 8; hex 8000017ed4450699; asc ~ E ;;
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
两条不同主键更新的语句相互等待,why?
从上述日志可以看出以下几点:
- 事务1锁定4行数据,一行数据等待主键行锁(因超时从而回滚),其他数据行未显示明细(肯定有事务2中等待锁的数据行)
- 事务2锁定2行数据,一行已经获取锁(事务1中等待锁的记录),一行等待获取锁
一个误区:锁定的数据行并不一定是由锁冲突的sql语句引起的,应该关注同一事务中是否有其它语句加锁
代码分析
class SaveUserJob implements SimpleJob {
@Autowired
@Transactional
CampaignJobInstanceService campaignJobInstanceService
@Override
void execute(ShardingContext shardingContext) {
log.info("save user:${JsonUtil.toString(shardingContext)}")
List<CampaignJobInstance> jobInstances = campaignJobInstanceService.findByStatus(CampaignJobInstanceStatusEnum.SAVING)
jobInstances.each { jobInstance ->
...
updateById(jobInstance)
...
}
}
}
class SendJob implements DataflowJob<CampaignUserLog> {
...
@Override
@Transactional
void processData(ShardingContext shardingContext, List<CampaignUserLog> logs) {
connectorExecuteService.execute(logs.findAll { it.status == CampaignUserJobStatusEnum.SENDING })
logs.groupBy { it.jobInstanceId }.each {
CampaignJobInstance jobInstance = findByInstanceId(it.key)
...
updateById(jobInstance)
}
}
...
}
SaveUserJob、SendJob中均涉及到遍历主键更新campaign_job_instance,且均开启事务,两个事务中更新顺序的不通便会造成相互持有对方需要的锁,从而死锁
为什么之前自测时没有触发?
圈选的用户量过少,只有1个,save user的瞬间还没来得及更新状态就发送完成了
把SendJob的频次由2分钟改为了5分钟,会累积多个jobInstance到一个事务中
解决方案
减小锁粒度
class SaveUserJob implements SimpleJob {
@Autowired
CampaignJobInstanceService campaignJobInstanceService
@Override
// 外层仅查询,去除事务
void execute(ShardingContext shardingContext) {
log.info("save user:${JsonUtil.toString(shardingContext)}")
List<CampaignJobInstance> jobInstances = campaignJobInstanceService.findByStatus(CampaignJobInstanceStatusEnum.SAVING)
jobInstances.each { jobInstance ->
// 抽取到service层,加上事务
campaignJobInstanceService.saveUser(jobInstance)
}
}
}
class SendJob implements DataflowJob<CampaignUserLog> {
...
@Override
// 外层仅查询,去除事务
void processData(ShardingContext shardingContext, List<CampaignUserLog> logs) {
...
logs.groupBy { it.jobInstanceId }.each {
...
// 抽取到service层,加上事务
CampaignJobInstance jobInstance = findByInstanceId(it.key)
campaignJobInstanceService.checkJobInstance(jobInstance)
}
}
...
}