TiDB常见问题

原文链接:https://www.lemonit.cn/wp-admin/post.php?post=5003&action=edit
常见问题

注意:

  1. 任何情况下都不要直接 kill Pump或Drainer
  2. 一但数据同步失败 Drainer就会进入 paused (暂停状态)
TiDB3.0.2同步 MariaDB 数据超长引发 binglog同步失败,解决方案如下:
  1. 修改TiDB错误
    例如:删除某个超长的列,然后重新创建一个与MySQL长度合理的列。 删除表时不要忘记备份数据

  2. 查看 drainer.log 找到发生异常的时间点

[tidb@dev11 ~]$ tail -1000f /home/tidb/deploy/log/drainer.log
......
[2019/08/22 11:27:21.586 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410634011353284609]
[2019/08/22 11:27:26.397 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410634012205252609]
[2019/08/22 11:27:30.608 +08:00] [ERROR] [executor.go:85] ["exec fail"] [query="DELETE FROM `dev2_pfizer_activiti`.`ACT_RU_JOB` WHERE `ID_` = ? LIMIT 1"] [args="[\"ZTA4MjMyZDAtYzNmYS0xMWU5LWJkZjEtMDY4YWEwOWI2NzQy\"]"] [error="Error 1213: Deadlock found when trying to get lock; try restarting transaction"]
......
# 异常的发生时间为  [2019/08/22 11:27:30.608 +08:00]
  1. 根据 异常的发生时间 [2019/08/22 11:27:30.608 +08:00] 到 pump.log 找到这个时间相近的 下一个 MaxCommitTS的值
[tidb@dev11 ~]$ vim /home/tidb/deploy/log/pump.log

# 在vim中查找
/[2019/08/22 11:27:3

# 根据时间的定位,最相近的下一个 TS值是  [MaxCommitTS=410634015285444609]
[2019/08/22 11:27:36.411 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":5790744,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/22 11:27:36.445 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=8932] [alivePullerCount=3] [MaxCommitTS=410634015285444609]
[2019/08/22 11:27:46.411 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":5791236,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/22 11:27:46.445 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=8932] [alivePullerCount=3] [MaxCommitTS=410634017644740612]

  1. 修改下游数据库中的CommitTS
MariaDB [(none)]> show databases;
+-----------------------------+
| Database                    |
+-----------------------------+ |
| information_schema          |
| mysql                       |
| performance_schema          |
| tidb_binlog                 |
| tidb_loader                 |
+-----------------------------+
22 rows in set (0.00 sec)

MariaDB [(none)]>
MariaDB [(none)]> use tidb_binlog;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [tidb_binlog]>
MariaDB [tidb_binlog]>
MariaDB [tidb_binlog]> show tables;
+-----------------------+
| Tables_in_tidb_binlog |
+-----------------------+
| checkpoint            |
+-----------------------+
1 row in set (0.00 sec)

MariaDB [tidb_binlog]> select * from checkpoint;
+---------------------+---------------------------------------------+
| clusterID           | checkPoint                                  |
+---------------------+---------------------------------------------+
| 6697082055159617667 | {"commitTS":410636899695198209,"ts-map":{}} |
+---------------------+---------------------------------------------+
1 row in set (0.00 sec)

MariaDB [tidb_binlog]>

  1. 启动 drainer
[tidb@test1 tidb-ansible]$ ansible-playbook start_drainer.yml

Congrats! All goes well. :-)
[tidb@test1 tidb-ansible]$



注意:

如果因为操作pump不得当导致 TiDB客户端无法连接,需要重新执行如下语句进行无感知更新
ansible-playbook rolling_update.yml --tags=tidb

注意:

任何情况下都不要直接  kill  Pump或Drainer

注意:

drainer发生异常后会自动切换为 paused 状态,要恢复为 online状态只需要重新执行一次语句即可
nohup ./bin/drainer -config config/drainer.toml &
重新部署容器后异常 这个问题是 pump 没启动 就设置了 开启binlog导致的
[tidb@test1 tidb-ansible]$ ansible-playbook start.yml
fatal: [192.168.180.46]: FAILED! => {"changed": false, "elapsed": 300, "msg": "the TiDB port 4000 is not up"}
PLAY [grafana_servers] **************************************************************************************************************************************************************************
        to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry

PLAY RECAP **************************************************************************************************************************************************************************************
192.168.180.46             : ok=27   changed=0    unreachable=0    failed=1
192.168.180.47             : ok=10   changed=0    unreachable=0    failed=1
192.168.180.48             : ok=13   changed=0    unreachable=0    failed=1
localhost                  : ok=7    changed=4    unreachable=0    failed=0


ERROR MESSAGE SUMMARY ***************************************************************************************************************************************************************************
[192.168.180.47]: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD port is up; message: {"changed": false, "elapsed": 300, "msg": "the PD port 2379 is not up"}

[192.168.180.48]: Ansible FAILED! => playbook: start.yml; TASK: wait until the TiKV port is up; message: {"changed": false, "elapsed": 300, "msg": "the TiKV port 20160 is not up"}

[192.168.180.46]: Ansible FAILED! => playbook: start.yml; TASK: wait until the TiDB port is up; message: {"changed": false, "elapsed": 300, "msg": "the TiDB port 4000 is not up"}



tidb log 异常信息 listener stopped, waiting for manual kill
[ERROR] [server.go:364] ["listener stopped, waiting for manual kill."]  # 原因是 开启binlog必须先开启 pump, 这个错误是因为 pump下线了导致的
[stack="github.com/pingcap/tidb/server.(*Server).Run\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/server/server.go:364\nmain.runServer\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/tidb-server/main.go:568\nmain.main\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/tidb-server/main.go:174\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"]
drainer log 异常信息 got signal to exit
[2019/08/20 20:56:27.850 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597664302825480]
[2019/08/20 20:56:32.301 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597665574223873]
[2019/08/20 20:56:36.537 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597666688335873]
[2019/08/20 20:56:40.817 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597667802447873]
[2019/08/20 20:56:45.456 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597669047631873]
[2019/08/20 20:56:49.539 +08:00] [INFO] [main.go:63] ["got signal to exit."] [signal=hangup] # 原因是使用 nohup 启动,但是客户连接是非正常使用 exit 断开,导致系统杀掉了 drainer 进程。
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:406] ["begin to close drainer server"]
[2019/08/20 20:56:49.539 +08:00] [ERROR] [pump.go:147] ["pump receive binlog failed"] [id=dev10:8250] [error=EOF]
[2019/08/20 20:56:49.543 +08:00] [INFO] [server.go:371] ["has already update status"] [id=dev10:8249]
[2019/08/20 20:56:49.543 +08:00] [INFO] [server.go:410] ["commit status done"]
[2019/08/20 20:56:49.543 +08:00] [INFO] [collector.go:130] ["publishBinlogs quit"]
[2019/08/20 20:56:49.544 +08:00] [INFO] [util.go:66] [Exit] [name=heartbeat]
[2019/08/20 20:56:49.544 +08:00] [INFO] [pump.go:72] ["pump is closing"] [id=dev10:8250]
[2019/08/20 20:56:49.544 +08:00] [INFO] [syncer.go:254] ["write save point"] [ts=410597669244239873]
[2019/08/20 20:56:49.544 +08:00] [INFO] [util.go:66] [Exit] [name=collect]
[2019/08/20 20:56:49.544 +08:00] [INFO] [load.go:434] ["Run()... in Loader quit"]
[2019/08/20 20:56:49.544 +08:00] [INFO] [mysql.go:114] ["Successes chan quit"]
[2019/08/20 20:56:49.633 +08:00] [INFO] [syncer.go:246] ["handleSuccess quit"]
[2019/08/20 20:56:49.633 +08:00] [INFO] [util.go:66] [Exit] [name=syncer]
[2019/08/20 20:56:49.633 +08:00] [INFO] [main.go:73] ["drainer exit"]
pump log 异常信息 got signal to exit
[2019/08/20 20:56:05.316 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=10016] [alivePullerCount=1] [MaxCommitTS=410597658037583873]
[2019/08/20 20:56:15.309 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":2225967,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/20 20:56:15.317 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=10016] [alivePullerCount=1] [MaxCommitTS=410597660396879873]
[2019/08/20 20:56:25.309 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":2226617,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/20 20:56:25.316 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=10018] [alivePullerCount=1] [MaxCommitTS=410597663542607875]
[2019/08/20 20:56:35.309 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":2229972,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/20 20:56:35.317 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=10054] [alivePullerCount=1] [MaxCommitTS=410597665587331079]
[2019/08/20 20:56:45.309 +08:00] [INFO] [storage.go:381] [DBStats] [DBStats="{\"WriteDelayCount\":0,\"WriteDelayDuration\":0,\"WritePaused\":false,\"AliveSnapshots\":0,\"AliveIterators\":0,\"IOWrite\":2230701,\"IORead\":0,\"BlockCacheSize\":0,\"OpenedTablesCount\":0,\"LevelSizes\":null,\"LevelTablesCounts\":null,\"LevelRead\":null,\"LevelWrite\":null,\"LevelDurations\":null}"]
[2019/08/20 20:56:45.317 +08:00] [INFO] [server.go:522] ["server info tick"] [writeBinlogCount=10058] [alivePullerCount=1] [MaxCommitTS=410597667815555073]
[2019/08/20 20:56:49.539 +08:00] [INFO] [main.go:65] ["got signal to exit."] [signal=hangup] # 原因是使用 nohup 启动,但是客户连接是非正常使用 exit 断开,导致系统杀掉了 pump 进程。
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:839] ["begin to close pump server"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:496] ["detect drainer checkpoint routine exit"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:537] ["gcBinlogFile exit"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:519] ["printServerInfo exit"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:472] ["genFakeBinlog exit"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [server.go:848] ["background goroutins are stopped"]
[2019/08/20 20:56:49.539 +08:00] [INFO] [node.go:216] ["Heartbeat goroutine exited"]
[2019/08/20 20:56:49.543 +08:00] [INFO] [server.go:832] ["update state success"] [NodeID=dev10:8250] [state=paused]
[2019/08/20 20:56:49.543 +08:00] [INFO] [server.go:851] ["commit status done"]
[2019/08/20 20:56:49.544 +08:00] [INFO] [server.go:855] ["grpc is stopped"]
[2019/08/20 20:56:49.544 +08:00] [INFO] [chaser.go:76] ["Slow chaser quits"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [server.go:860] ["storage is closed"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [server.go:865] ["pump node is closed"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [server.go:871] ["has closed pdCli"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [server.go:876] ["has closed tiStore"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [main.go:68] ["pump is closed"]
[2019/08/20 20:56:49.546 +08:00] [INFO] [main.go:80] ["pump exit"]



二进制部署的pump/drainer 转 配置文件部署的pump/drainer 注意事项

要按照如下步骤切换部署

  1. 加入pump/drainer集群
  2. 暂停二进制pump/drainer
  3. 下线进进制pump/drainer(注:状态必须是offline状态否则,切换失败)
    tail -1000 drainer.log 异常信息如下:
[tidb@dev11 log]$ pwd
/home/tidb/deploy/log
[tidb@dev11 log]$ tail -1000 drainer.log

[2019/08/21 08:46:07.005 +08:00] [INFO] [pump.go:133] ["pump create pull binlogs client"] [id=dev10:8250]
[2019/08/21 08:46:07.006 +08:00] [ERROR] [pump.go:217] ["pump create PullBinlogs client failed"] [id=dev10:8250] [error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.180.32:8250: connect: connection refused\""]
[2019/08/21 08:46:07.006 +08:00] [ERROR] [pump.go:135] ["pump create pull binlogs client failed"] [id=dev10:8250] [error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.180.32:8250: connect: connection refused\""] [errorVerbose="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.180.32:8250: connect: connection refused\"\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/tidb-binlog/drainer.(*Pump).createPullBinlogsClient\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:221\ngithub.com/pingcap/tidb-binlog/drainer.(*Pump).PullBinlog.func1\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/drainer/pump.go:134\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"]



Drainer 集群同步数据到 MariaDB 产生重复数据

上一篇:tidb架构~本地化安装


下一篇:TiDB 数据库的四大应用场景分析