proxysql运维实践

前置基础参考:https://www.cnblogs.com/gered/p/15868767.html#autoid-4-0-0

【1】环境

当前架构

proxysql运维实践

  proxysql运维实践

  proxysql运维实践

 

  proxysql运维实践

【2】场景

(1)1主2从,主库挂掉

10秒就超时(这个阈值是连接超时)反馈出来了,但如果我们的DML、select 等其他操作 超过10秒是没有关系的;

  proxysql运维实践

主库挂了,非 mysql_query_rules 定位到从组的请求,都会连不上,我们查看一下相关日志

select * from runtime_mysql_servers;
select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_ping_log where ping_error is not null limit 10;
select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_connect_log where connect_error is not null order by time_start_us desc limit 10;
select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_read_only_log where error is not null limit 10;

结果:

admin@mysqldb 17:02:48 [(none)]>select * from runtime_mysql_servers;
+--------------+----------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname       | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 192.168.148.39 | 3306 | 0         | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 192.168.148.30 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 192.168.148.27 | 3306 | 0         | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+----------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

admin@mysqldb 17:02:52 [(none)]>select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_ping_log where ping_error is not null limit 10;
+---------------------+----------------+------+------------------+----------------------+---------------------------------------------------------+
| datetime            | hostname       | port | time_start_us    | ping_success_time_us | ping_error                                              |
+---------------------+----------------+------+------------------+----------------------+---------------------------------------------------------+
| 2022-02-16 08:56:43 | 192.168.148.39 | 3306 | 1645001803735858 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:45 | 192.168.148.39 | 3306 | 1645001805750545 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:47 | 192.168.148.39 | 3306 | 1645001807770656 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:49 | 192.168.148.39 | 3306 | 1645001809742158 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:51 | 192.168.148.39 | 3306 | 1645001811771683 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:53 | 192.168.148.39 | 3306 | 1645001813752847 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:55 | 192.168.148.39 | 3306 | 1645001815746628 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:57 | 192.168.148.39 | 3306 | 1645001817742752 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:59 | 192.168.148.39 | 3306 | 1645001819728439 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:57:01 | 192.168.148.39 | 3306 | 1645001821729269 | 0                    | Can't connect to MySQL server on '192.168.148.39' (115) |
+---------------------+----------------+------+------------------+----------------------+---------------------------------------------------------+
10 rows in set (0.00 sec)

admin@mysqldb 17:02:52 [(none)]>select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_connect_log where connect_error is not null order by time_start_us desc limit 10;
+---------------------+----------------+------+------------------+-------------------------+---------------------------------------------------------+
| datetime            | hostname       | port | time_start_us    | connect_success_time_us | connect_error                                           |
+---------------------+----------------+------+------------------+-------------------------+---------------------------------------------------------+
| 2022-02-16 08:56:47 | 192.168.148.39 | 3306 | 1645001807770108 | 0                       | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:45 | 192.168.148.39 | 3306 | 1645001805716648 | 0                       | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:43 | 192.168.148.39 | 3306 | 1645001803713712 | 0                       | Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:41 | 192.168.148.39 | 3306 | 1645001801713565 | 0                       | Can't connect to MySQL server on '192.168.148.39' (115) |
+---------------------+----------------+------+------------------+-------------------------+---------------------------------------------------------+
4 rows in set (0.00 sec)

admin@mysqldb 17:02:52 [(none)]>select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_read_only_log where error is not null limit 10;
+---------------------+----------------+------+------------------+-----------------+-----------+---------------------------------------------------------------------------------------------+
| datetime            | hostname       | port | time_start_us    | success_time_us | read_only | error                                                                                       |
+---------------------+----------------+------+------------------+-----------------+-----------+---------------------------------------------------------------------------------------------+
| 2022-02-16 08:56:42 | 192.168.148.39 | 3306 | 1645001802730203 | 0               | NULL      | Lost connection to MySQL server during query                                                |
| 2022-02-16 08:56:44 | 192.168.148.39 | 3306 | 1645001804750337 | 0               | NULL      | timeout on creating new connection: Can't connect to MySQL server on '192.168.148.39' (115) |
| 2022-02-16 08:56:46 | 192.168.148.39 | 3306 | 1645001806742312 | 0               | NULL      | timeout on creating new connection: Can't connect to MySQL server on '192.168.148.39' (115) |
+---------------------+----------------+------+------------------+-----------------+-----------+---------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

然后我们看一下最新的

select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_ping_log order by time_start_us desc limit 10;
select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_connect_log order by time_start_us  desc limit 10;
select from_unixtime(time_start_us/1000/1000) as `datetime`,* from mysql_server_read_only_log order by time_start_us desc limit 10;

从下图我们可以看出, ping 还是一直不通,而 connect 、read_only 中甚至直接剔除对 主库 192.168.148.39 的连接尝试了

  proxysql运维实践

 

结合上面的所有,我们发现 ping_log 报错了 300次(2s一次的频率),而 connect_log(2秒一次的频率)只报了4次;

如下图,我们发现我们选中的参数,次数是3,所以就 connect 重试3次后,所以 connect_log 中只有4次记录是正常的;

  proxysql运维实践

(2)1主2从,主库挂掉后重新启动恢复

恢复主库后:可无缝连上,现有连接无需重连,无需断开原长连接

  proxysql运维实践

恢复后,我们看看 mysql server,hostgrou_id

我们可以看到,status 还是 shunned 状态

  proxysql运维实践

 

 我们执行一个路由到主库的相关操作,它状态就变回来了;

  proxysql运维实践

 

(2)1主2从=》主库挂掉,从变主库=》1主1从

关闭主库,从库1: 192.168.148.27 执行如下代码,变成 proxysql中的主库

stop slave;
set global read_only=0;

如下图:

为什么主库变成下图这样,即在10 写组,又在20 读组?因为如下参数

mysql-monitor_writer_is_also_reader = true(当原read_only=1 的server read_only=0时,迁移一份到 writer 组的同时,也保留它的 reader 角色)

这个是为了避免故障转移后只有1个实例的时候,可以同时兼顾读写,从而业务连续;

 

  proxysql运维实践

 

我们再查询看看:是可以都读负载均衡分发的;写也是写到 我们的新主库 192.168.148.27 中去了;

注意:proxysql 连接在任意时候都没断开过,所以主从切换等情况不影响它的长连接,也不需要重连等等;

  proxysql运维实践

 

上一篇:Flink问题:解决 bin/config.sh: line 32: syntax error near unexpected token


下一篇:【linux系统】命令学习(六)awk sed grep 与管道的使用