原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 、作者信息和本声明。否则将追究法律责任。http://navyaijm.blog.51cto.com/4647068/1241728
收到短信报警,两台数据库都报slave同步失败了,先说明一下环境,架构:lvs+keepalived+amoeba+mysql,主主复制,单台写入,
主1:192.168.0.223(写)
主2:192.168.0.230
好吧,先show slave status \G看一下同步失败的具体报错吧
登录主2库查看:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.0.223 Master_User: slave Master_Port: 13204 Connect_Retry: 60 Master_Log_File: mysql-bin.000009 Read_Master_Log_Pos: 50419 Relay_Log_File: mysqld-relay-bin.000014 Relay_Log_Pos: 34626 Relay_Master_Log_File: mysql-bin.000009 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: mysql,information_schema,performance_schema, test ,mysql,information_schema,performance_schema, test
Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Error 'Duplicate entry ' 1329544 ' for key ' PRIMARY '' on query. Default database: 'data' . Query: 'insert into kn_chongzhi(orderid,aa,buyNum,state, type ,create_time,fac,cc,flag)
values(20130702173025036581,15935779926,1,0, 'SJ' ,1372757425, '30.27' , '30' ,100)'
Skip_Counter: 0 Exec_Master_Log_Pos: 34480 Relay_Log_Space: 51171 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry ' 1329544 ' for key ' PRIMARY '' on query. Default database: 'data' . Query: 'insert into kn_chongzhi(orderid,aa,buyNum,state, type ,create_time,fac,cc,flag)
values(20130702173025036581,15935779926,1,0, 'SJ' ,1372757425, '30.27' , '30' ,100)'
Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec)
|
尼玛,苦逼的又是主键冲突,先查看一下这张表的结构:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
mysql> desc kn_chongzhi; +-------------+-----------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+-----------------+------+-----+---------+----------------+ | id | int(10) | NO | PRI | NULL | auto_increment |
| aa | varchar(32) | NO | MUL | NULL | | | bizOfferId | varchar(32) | NO | | NULL | | | number | varchar(20) | NO | MUL | NULL | | | cc | float(10,2) | NO | | NULL | | | fac | float(10,2) | YES | | 0.00 | | | buyNum | int(10) | NO | | NULL | | | state | tinyint(4) | NO | | 0 | | | type | enum( 'SJ' , 'QB' ) | NO | | SJ | |
| create_time | int(11) | NO | | NULL | | | update_time | int(11) | NO | | NULL | | | flag | int(10) | NO | | 0 | | +-------------+-----------------+------+-----+---------+----------------+ 12 rows in set (0.00 sec)
|
想必大家已经知道问题是这么产生的了,这里我再大体的说一下,可能有些人还不明白哈,回头看前面的架构,引起 这个问题的原因是主1的网络抖动,导致amoeba把写切到了主2,主1的网络好了,写又切回了主1,由于主键ID是自曾的,所以就出现了这个问题,我举个例子:
开始是写主1的,已经写6条数据(id=1、2、3、4、5、6),突然主1网络抖动,开始在主2写了三条(id=7、8、9),主1的网络又恢复了,写又在主1上了(id=7、8、9、10、。。。。),这时,主1要把id=7、8、9、10.。。。。的数据复制给主2,主2 要把id=7、8、9三条数据复制给主1,这不就傻逼了吗?
处理的过程:
1、在两个库上stop slave;
2、在主2上执行select * from kn_chongzhi where id>=1329544\G (查看在主2上写了几条数据)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
mysql> select * from kn_chongzhi where id >=1329544\G
*************************** 3661. row *************************** id : 1329545
aa: 20130702213504529562 bizOfferId: DK201307021139565210 number: 13991056094 cc: 30.00 fac: 30.22 buyNum: 1 state: 2 type : SJ
create_time: 1372772104 update_time: 1372772474 flag: 100 *************************** 3662. row *************************** id : 1329546
aa: 20130702213506629648 bizOfferId: DK201307021139588209 number: 15511391791 cc: 30.00 fac: 30.17 buyNum: 1 state: 0 type : SJ
create_time: 1372772106 update_time: 0 flag: 100 *************************** 3663. row *************************** id : 1329547
aa: 20130702213516595293 bizOfferId: DK201307021139758209 number: 13615611693 cc: 100.00 fac: 99.85 buyNum: 1 state: 2 type : SJ
create_time: 1372772116 update_time: 1372772315 flag: 101 |
3、在主2上delete from kn_chongzhi where id>=1329544; 并设置自曾ID从1329545开始
1
2
3
4
5
|
mysql> delete from kn_chongzhi where id >=1329544;
Query OK, 0 rows affected (0.00 sec) mysql> alter table kn_chongzhi auto_increment=1329545; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 |
4、主2上slave start,show slave status \G,发现主2同步主1已经ok了;
5、在主2上show master status \G,获取binlog文件名和Position点,在主1上重新change master
6、把上面三条数据保存好,发给程序猿手到录入主1,
PS:当然,如果我按一下设置,肯定不会出现这个问题,如果业务有要求,ID必须连续,那就不能设置这两个参数了:
1
2
3
4
5
6
|
主1: auto-increment-increment=2 auto-increment-offset=1 主2: auto-increment-increment=2 auto-increment-offset=2 |
本文出自 “屌丝运维男” 博客,请务必保留此出处http://navyaijm.blog.51cto.com/4647068/1241728