MySQL复制日常维护与管理

一、复制一些常见设置

1、mysql复制启动时参数:

mysql启动时的参数包括:master_host,master_port,master_user,master_password,master_log_file,master_log_pos。这几个参数需要在从库上配置,用来记录需要复制的主数据库的地址、端口、和用户等。mysq复制还包括以下几个常用的复制参数:log-slave_updates、master-connect-retry、read-only等。

log-slave_updates:这个参数用来控制从库上的更新操作是否写入binlog文件,默认是关闭的。但是如果需要配置级联复制时,从库上的更新操作就要写入binlog,以备用来同步下一级的slave。这个启动参数需要和--log-bin一起使用。

master-connect-retry:这个参数用来设置和主库的连接丢失时重试的时间间隔,默认是60s。

read-only:该参数用来设置从库只能接受超级用户的更新操作,从而限制应用程序错误的对从库的更新操作。

$bin/mysqld_safe --read-only &启动从库后,普通用户的更新操作将会报错。

2、制定复制的数据库或表:

可以使用replicate-do-db、replicate-do-table、replicate-ignore-db、replicate-ignore-table或replicate-wild-do-table来指定主数据库复制到从数据库的数据库或表。有时用户只需要将关键表或需要查询的表复制的从库上,这样就可以通过这几个参数来进行相应控制。

1)主库表及数据信息:

mysql> select * from t1;

+------+

| id   |

+------+

|    1 |

|    2 |

|    3 |

|    4 |

|    5 |

|    8 |

|    9 |

+------+

7 rows in set (0.00 sec)

mysql> select * from t3;

+------+

| id   |

+------+

|    1 |

|    2 |

|    3 |

|    4 |

|    5 |

+------+

5 rows in set (0.00 sec)

2)以以下方式打开从库,只复制t3表数据库:

[root@chavinking mysql]# bin/mysqld_safe --replicate-do-table=test.t3 &

[1] 22538

[root@chavinking mysql]# 161202 19:43:49 mysqld_safe Logging to '/usr/local/mysql/data/chavinking.err'.

161202 19:43:49 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data

mysql> start slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)

3)主库进行dml操作:

mysql> delete from t1 where id <>1;

Query OK, 6 rows affected (30.17 sec)

mysql> delete from t3 where id <>1;

Query OK, 4 rows affected (0.02 sec)

4)从库查看数据复制情况:

mysql> select * from test.t1;

+------+

| id   |

+------+

|    1 |

|    2 |

|    3 |

|    4 |

|    5 |

|    8 |

|    9 |

+------+

7 rows in set (0.01 sec)

mysql> select * from test.t3;

+------+

| id   |

+------+

|    1 |

+------+

1 row in set (0.00 sec)

发现从库只复制了主库针对t3表的dml操作。

3、slave-skip-errors

在复制过程中,由于各种原因可能会导致从库执行binlog中sql出错的情况,默认情况下,从库将会停止复制进程,不在进行同步,等待认为介入处理。这个过程不能及时发现将会对备份产生重大影响,此参数的作用就是用来定义复制过程中从库可以自动跳过错误号。这个参数可以定义多个错误号或者通过定义成all跳过全部错误,具体语法如下:

--slave-skip-errors=[err_code1,err_code2,......]|all

二、日常管理维护

1、查看从库运行状态:通过show slave status命令可以查看当前从库运行状态。

mysql> show slave status \G;

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.80.133

Master_User: rep1

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: dbking-bin.000004

Read_Master_Log_Pos: 2363

Relay_Log_File: chavinking-relay-bin.000015

Relay_Log_Pos: 706

Relay_Master_Log_File: dbking-bin.000004

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table: test.t3

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 2363

Relay_Log_Space: 884

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 1

Master_UUID: 9b92b2a8-b7e0-11e6-81e4-000c29fa5a95

Master_Info_File: /usr/local/software/mysql-5.6.24-linux-glibc2.5-x86_64/data/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set:

Auto_Position: 0

1 row in set (0.00 sec)

解释:在上面输出的信息中,我们最关心的就是Slave_IO_Running、Slave_SQL_Running这两项是否是yes,只要其中一项是no则表示复制出现问题,具体原因通过last_errno查看。

Slave_IO_Running:即io进程状态,此进程负责从库从主库读取binlog并写入relay log文件。

Slave_SQL_Running:即sql进程,负责读取relay log中event,并在从库执行。

2、主从库同步维护:

1)锁定主库,禁止任何dml操作,并取得主库当前坐标:

mysql> flush tables with read lock;

Query OK, 0 rows affected (0.00 sec)

mysql> show master status;

+-------------------+----------+--------------+------------------+-------------------+

| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |

+-------------------+----------+--------------+------------------+-------------------+

| dbking-bin.000005 |      120 |              |                  |                   |

+-------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

2)执行以下语句,其中master_pos_wait的参数来自于步骤1值,这个select语句将阻塞直到从库达到指定的偏移量后,返回0,返回-1表示超时退出,查询返回0,主从库同步。

mysql> select master_pos_wait('dbking-bin.000005','120');

+--------------------------------------------+

| master_pos_wait('dbking-bin.000005','120') |

+--------------------------------------------+

|                                          0 |

+--------------------------------------------+

1 row in set (0.10 sec)

3)解锁主库:

mysql> unlock tables;

Query OK, 0 rows affected (0.00 sec)

3、从库复制出错处理:

某些情况会出现从库更新失败情况,这种情况我们首先需要确定是否是主从库表结构不一致造成的,如果确定是表结构不一致造成的,那么同步表结构后重新执行start slave语句。如果不是表结构不同造成的,则需要确定手动更新是否安全,然后忽视来自主库的更新失败语句。跳过主库跟新语句为set global sql_slave_skip_counter=n,其中n等于1或者2。如果来自主库的更新语句不使用auto_increment或last_insert_id(),n值应为1,否则为2。原因是使用auto_increment或者last_insert_id()的语句需要从二进制文件去两个事件。

1)在从库停止复制进程并且跳过2个语句:

mysql> select * from t1;

Empty set (0.00 sec)

mysql> stop slave;

Query OK, 0 rows affected (0.04 sec)

mysql> set global sql_slave_skip_counter=2;

Query OK, 0 rows affected (0.02 sec)

2)在主库插入3条记录:

mysql> insert into t1 values(1);

Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values(2);

Query OK, 1 row affected (0.02 sec)

mysql> insert into t1 values(3);

Query OK, 1 row affected (0.00 sec)

mysql> select * from t1;

+------+

| id   |

+------+

|    1 |

|    2 |

|    3 |

+------+

3 rows in set (0.00 sec)

3)在从库启动复制,查看复制结果:

mysql> start slave;

Query OK, 0 rows affected (0.09 sec)

mysql> select * from t1;

+------+

| id   |

+------+

|    3 |

+------+

2 rows in set (0.00 sec)

4、log event entry exceeded max_allowed_packet的处理

如果应用中使用大的blog列或者字符串,那么从库上恢复时,可能出现“log event entry exceeded max_allowed_packet”错误,这是因为含有大文本的记录无法通过网络进行传输导致。解决办法是从库上增加max_allowed_packet参数的大小,这个参数的默认值是1m,可以按照实际的需要进行修改。

5、在大多数情况下,一般使用单主复制(一台主库对多台从库)。但是在某些特殊情况下,可能会采用多住复制(多台主库对应一台从库)。这时,如果主库的表采用自增长变量,那么复制到从库的同一张表可能会引起主键冲突,因为系统参数auto_increment_increment和auto_increment_offset的默认值是1,这样多台主库的自增长变量迟早会发生冲突。比如,两个master可以按照如下设置:

master1:auto_increment_increment=2,auto_increment_offset=1(1,3,5,7....)

master1:auto_increment_increment=2,auto_increment_offset=0(0,2,3,6,....)

6、查看从库的复制进度:

查看复制进度可以通过show processlist下的sql_slave_running进程的time取得。以下是一个测试用例:

1)在主库t1表添加时间戳列,然后插入1条数据:

mysql> alter table t1 add column createtime datetime;

Query OK, 0 rows affected (0.38 sec)

Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t1 values(5,now());

Query OK, 1 row affected (0.04 sec)

mysql> select * from t1;

+------+---------------------+

| id   | createtime          |

+------+---------------------+

|    1 | NULL                |

|    2 | NULL                |

|    3 | NULL                |

|    5 | 2016-12-02 22:16:53 |

+------+---------------------+

4 rows in set (0.00 sec)

2)为了方便模拟时间,这里把从库的复制的io进程(slave_io_running)停下来,使得从库暂时不写中继日志,也就是最后执行sql就是当前中继日志中最后一个sql。

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.04 sec)

3)一段时间后查询从库的复制情况:

mysql> select * from t1;

+------+---------------------+

| id   | createtime          |

+------+---------------------+

|    2 | NULL                |

|    3 | NULL                |

|    5 | 2016-12-02 22:16:53 |

+------+---------------------+

3 rows in set (0.00 sec)

mysql> select now();

+---------------------+

| now()               |

+---------------------+

| 2016-12-02 22:23:48 |

+---------------------+

1 row in set (0.00 sec)

4)这是查看sql线程时间:

mysql> show processlist \G;

*************************** 1. row ***************************

Id: 3

User: root

Host: localhost

db: test

Command: Query

Time: 0

State: init

Info: show processlist

*************************** 2. row ***************************

Id: 5

User: system user

Host:

db: NULL

Command: Connect

Time: 491

State: Slave has read all relay log; waiting for the slave I/O thread to update it

Info: NULL

2 rows in set (0.00 sec)

由于msyql复制的机制是执行主库传输过来的二进制日志文件,二进制日志中的每个语句通过设置时间戳来保证执行时间的顺序和正确性,所以每个语句执行之前都会首先设置时间戳,而通过这个查询进程time就可以知道最后设置的时间戳和当前时间的差距。

7、提高复制性能

msyql主库是多线程并发写入,而从库仅仅只有一个sql线程应用日志,那么就容易出现从库住不上主库的情况,可以通过show slave status的secounds_behind_master粗略显示从库落后主库的时间。

方案1:通过拆分减少一个从库上需要数据同步的表来解决。首先考虑配置一主多从的架构,然后再不同的从库上,通过设置不同的replicate-do-db、replicate-do-table、replicate-ignore-db、replicate-ignore-table或replicate-wild-do-table参数,使得不同的从库复制不同的库/表,减少从库io压力。架构如图:

MySQL复制日常维护与管理

MySQL复制日常维护与管理

方案2:msyql5.6提供基于schema的多线程复制,允许从库并行更新。通过设置参数slave_parallel_workers为2,让mysql从库启动两个sql线程,例如:

mysql> show variables like '%slave_parallel_worker%';

+------------------------+-------+

| Variable_name          | Value |

+------------------------+-------+

| slave_parallel_workers | 0     |

+------------------------+-------+

1 row in set (0.00 sec)

mysql> select version();

+-----------+

| version() |

+-----------+

| 5.6.24    |

+-----------+

1 row in set (0.00 sec)

mysql> set global slave_parallel_workers=2;

Query OK, 0 rows affected (0.02 sec)

mysql> show variables like '%slave_parallel_worker%';

+------------------------+-------+

| Variable_name          | Value |

+------------------------+-------+

| slave_parallel_workers | 2     |

+------------------------+-------+

1 row in set (0.00 sec)

mysql> stop slave;

Query OK, 0 rows affected (0.02 sec)

mysql> start slave;

Query OK, 0 rows affected, 1 warning (0.14 sec)

mysql> show processlist \G;

*************************** 1. row ***************************

Id: 3

User: root

Host: localhost

db: test

Command: Query

Time: 0

State: init

Info: show processlist

*************************** 2. row ***************************

Id: 7

User: system user

Host:

db: NULL

Command: Connect

Time: 3

State: Waiting for master to send event

Info: NULL

*************************** 3. row ***************************

Id: 8

User: system user

Host:

db: NULL

Command: Connect

Time: 3

State: Slave has read all relay log; waiting for the slave I/O thread to update it

Info: NULL

*************************** 4. row ***************************

Id: 9

User: system user

Host:

db: NULL

Command: Connect

Time: 3

State: Waiting for an event from Coordinator

Info: NULL

*************************** 5. row ***************************

Id: 10

User: system user

Host:

db: NULL

Command: Connect

Time: 3

State: Waiting for an event from Coordinator

Info: NULL

5 rows in set (0.00 sec)

三、切换主从库

在主库出现故障情况下,需要切换从库为主库对外提供服务,以保证最大限度的可用性。mysql复制架构下切换主从库步骤如下:

1)首先确保所有的数据库都已经执行了relay log中的全部跟新,在每个数据库上,执行stop slave io_thread,然后检查show processlist的输出,直到看到has read all relay log,表示更新执行完毕。

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.06 sec)

mysql> show processlist \G;

*************************** 1. row ***************************

Id: 3

User: root

Host: localhost

db: test

Command: Query

Time: 0

State: init

Info: show processlist

*************************** 2. row ***************************

Id: 8

User: system user

Host:

db: NULL

Command: Connect

Time: 567

State: Slave has read all relay log; waiting for the slave I/O thread to update it

Info: NULL

*************************** 3. row ***************************

Id: 9

User: system user

Host:

db: NULL

Command: Connect

Time: 567

State: Waiting for an event from Coordinator

Info: NULL

*************************** 4. row ***************************

Id: 10

User: system user

Host:

db: NULL

Command: Connect

Time: 567

State: Waiting for an event from Coordinator

Info: NULL

4 rows in set (0.00 sec)

2)然后再从库上执行stop slave,然后执行reset master切换成主库:

mysql> stop slave;

Query OK, 0 rows affected (0.04 sec)

--以binlog方式重启从库

mysql> reset master;

Query OK, 0 rows affected (0.02 sec)

3)主库修复好后,通过如下命令制定原从库为主库:

mysql> change master to

-> master_host='192.168.80.136',

-> master_user='rep1',

-> master_password='dbking',

-> master_port=3306,

-> master_log_file='dbking-bin.000001',

-> master_log_pos=120;

Query OK, 0 rows affected, 2 warnings (0.06 sec)

mysql> start slave;

Query OK, 0 rows affected (0.05 sec)

mysql> show processlist \G;

*************************** 1. row ***************************

Id: 3

User: root

Host: localhost

db: NULL

Command: Query

Time: 0

State: init

Info: show processlist

*************************** 2. row ***************************

Id: 5

User: system user

Host:

db: NULL

Command: Connect

Time: 11

State: Waiting for master to send event

Info: NULL

*************************** 3. row ***************************

Id: 6

User: system user

Host:

db: NULL

Command: Connect

Time: 6

State: Slave has read all relay log; waiting for the slave I/O thread to update it

Info: NULL

3 rows in set (0.00 sec)

mysql> show slave status \G;

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.80.136

Master_User: rep1

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: dbking-bin.000001

Read_Master_Log_Pos: 120

Relay_Log_File: chavinking-relay-bin.000002

Relay_Log_Pos: 284

Relay_Master_Log_File: dbking-bin.000001

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 120

Relay_Log_Space: 462

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 222

Master_UUID: 9b92b2a8-b7e0-11e6-81e4-000c29fa5a96

Master_Info_File: /usr/local/software/mysql-5.6.24-linux-glibc2.5-x86_64/data/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set:

Auto_Position: 0

1 row in set (0.00 sec)

4)通知应用程序更改数据源链接新主库。

5)删除新主库上的master.info relay-log.info文件,否则下次启动新主库还会以从库身份启动:

[root@chavinking data]# rm master.info relay-log.info -f

上一篇:Spring AOP Capabilities and Goal


下一篇:批量生成protoBuf到cs文件