fe 启动异常
2024-10-08 09:24:57.669+08:00 INFO (stateChangeExecutor|87) [DatabaseTransactionMgr.replayUpsertTransactionState():1702] remove expired transaction: TransactionState. txn_id: 189324, label: delete_031c5090-7e2d-11ef-bdd8-000c29967e13, db id: 10004, table id list: 10044, callback id: -1, coordinator: FE: 192.168.1.49, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1727591726260, write end time: -1, allow commit time: -1, commit time: 1727591726313, finish time: 1727591726362, write cost: 53ms, publish total cost: 49ms, total cost: 102ms, reason: attachment: com.starrocks.transaction.InsertTxnCommitAttachment@43afccbe
2024-10-08 09:24:57.669+08:00 INFO (stateChangeExecutor|87) [TxnStateCallbackFactory.removeCallback():44] remove callback of txn state : 2230521. current callback size: 1
2024-10-08 09:24:57.669+08:00 INFO (stateChangeExecutor|87) [LoadMgr.replayEndLoadJob():300] LOAD_JOB=2230521, operation={LoadJobEndOperation{id=2230521, loadingStatus=EtlStatus{state=RUNNING, trackingUrl='', stats={}, counters={}, tableCounters={}, fileMap={}, progress=0, failMsg='', dppResult='null'}, progress=100, loadStartTimestamp=1727591726262, finishTimestamp=1727591726363, jobState=FINISHED, failMsg=null}}, msg={replay end load job}
2024-10-08 09:24:57.669+08:00 INFO (stateChangeExecutor|87) [LoadMgr.replayEndLoadJob():305] remove expired job: com.starrocks.load.loadv2.InsertLoadJob@2208f9
2024-10-08 09:24:57.673+08:00 INFO (stateChangeExecutor|87) [EditLog.loadJournal():232] Begin to unprotect create materialized view. db = ads_test create materialized view = 2230522 tableName = ads_vip_labels_d_f_mv_2
2024-10-08 09:24:57.673+08:00 INFO (stateChangeExecutor|87) [MaterializedView.setActive():492] set ads_vip_labels_d_f_mv_2 to active
2024-10-08 09:24:57.679+08:00 INFO (stateChangeExecutor|87) [CachingMvPlanContextBuilder.putAstIfAbsent():172] Add mv ads_vip_labels_d_f_mv_2 input ast cache
2024-10-08 09:24:57.680+08:00 INFO (stateChangeExecutor|87) [TaskManager.replayCreateTaskRun():701] replayCreateTaskRun:TaskRunStatus{queryId='41f168e8-7e2e-11ef-bdd8-000c29967e13', taskName='mv-2230522', createTime=1727592261171, finishTime=0, state=PENDING, progress=0%, dbName='ads_test', definition='insert overwrite `ads_vip_labels_d_f_mv_2` SELECT ...', postRun='ANALYZE SAMPLE TABLE ads_vip_labels_d_f_mv_2 WITH ASYNC MODE', user='dolphin', errorCode=0, errorMessage='null', expireTime=1727678661171, priority=80, mergeRedundant=false, extraMessage={"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}}
2024-10-08 09:24:57.680+08:00 INFO (stateChangeExecutor|87) [TaskRun.initStatus():305] init task status, task:mv-2230522, query_id:41f168e8-7e2e-11ef-bdd8-000c29967e13, create_time:1727592261171
2024-10-08 09:24:57.680+08:00 INFO (stateChangeExecutor|87) [TaskManager.replayUpdateTaskRun():736] replayUpdateTaskRun:TaskRunStatus{queryId='41f168e8-7e2e-11ef-bdd8-000c29967e13', taskId='2230531', finishTime=0, fromStatus=PENDING, toStatus=RUNNING, errorCode=0, errorMessage='null', extraMessage={"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}}
2024-10-08 09:24:57.680+08:00 INFO (stateChangeExecutor|87) [InsertOverwriteJobRunner.replayStateChange():167] replay state change:InsertOverwriteStateChangeInfo{jobId=2230532, fromState=OVERWRITE_PENDING, toState=OVERWRITE_RUNNING, sourcePartitionIds=[2230524], tmpPartitionIds=[2230533]}
2024-10-08 09:24:57.685+08:00 WARN (stateChangeExecutor|87) [GlobalStateMgr.replayJournalInner():2382] catch exception when replaying 7170064,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 10242
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:1179) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2369) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2318) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.transferToLeader(GlobalStateMgr.java:1312) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.access$100(GlobalStateMgr.java:346) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$1.transferToLeader(GlobalStateMgr.java:815) ~[starrocks-fe.jar:?]
at com.starrocks.ha.StateChangeExecutor.runOneCycle(StateChangeExecutor.java:103) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:107) ~[starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
at com.starrocks.server.LocalMetastore.replayAddPartition(LocalMetastore.java:1478) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayAddPartition(GlobalStateMgr.java:2562) ~[starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:289) ~[starrocks-fe.jar:?]
... 7 more
2024-10-08 09:24:57.690+08:00 WARN (stateChangeExecutor|87) [GlobalStateMgr.replayJournal():2320] got interrupt exception or inconsistent exception when replay journal 7170064, will exit,
com.starrocks.journal.JournalInconsistentException: failed to load journal type 10242
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:1179) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2369) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2318) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.transferToLeader(GlobalStateMgr.java:1312) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.access$100(GlobalStateMgr.java:346) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr$1.transferToLeader(GlobalStateMgr.java:815) ~[starrocks-fe.jar:?]
at com.starrocks.ha.StateChangeExecutor.runOneCycle(StateChangeExecutor.java:103) ~[starrocks-fe.jar:?]
at com.starrocks.common.util.Daemon.run(Daemon.java:107) ~[starrocks-fe.jar:?]
Caused by: java.lang.NullPointerException
at com.starrocks.server.LocalMetastore.replayAddPartition(LocalMetastore.java:1478) ~[starrocks-fe.jar:?]
at com.starrocks.server.GlobalStateMgr.replayAddPartition(GlobalStateMgr.java:2562) ~[starrocks-fe.jar:?]
at com.starrocks.persist.EditLog.loadJournal(EditLog.java:289) ~[starrocks-fe.jar:?]
... 7 more
2024-10-10 09:20:51.108+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229309, ignore partition: 2229308-ads_vip_labels_d_f_mv_2 table: 2229306-ads_vip_labels_d_f_mv_2 db: 475686-ads_test
2024-10-10 09:20:51.108+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229311, ignore partition: 2229308-ads_vip_labels_d_f_mv_2 table: 2229306-ads_vip_labels_d_f_mv_2 db: 475686-ads_test
2024-10-10 09:20:51.108+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229313, ignore partition: 2229308-ads_vip_labels_d_f_mv_2 table: 2229306-ads_vip_labels_d_f_mv_2 db: 475686-ads_test
2024-10-10 09:20:51.114+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229278, ignore partition: 2229277-tmp_test_20240927_70_vip_mv table: 2229275-tmp_test_20240927_70_vip_mv db: 475686-ads_test
2024-10-10 09:20:51.114+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229280, ignore partition: 2229277-tmp_test_20240927_70_vip_mv table: 2229275-tmp_test_20240927_70_vip_mv db: 475686-ads_test
2024-10-10 09:20:51.114+08:00 WARN (meta_recovery|25) [MetaRecoveryDaemon.recover():106] cannot find quorum version for tablet: 2229282, ignore partition: 2229277-tmp_test_20240927_70_vip_mv table: 2229275-tmp_test_20240927_70_vip_mv db: 475686-ads_test
2024-10-10 09:21:00.647+08:00 INFO (ReportHandler|208) [ReportHandler.tabletReport():410] backend[10006] reports 37269 tablet(s). report version: 17283536780505
2024-10-10 09:21:00.738+08:00 INFO (colocate group clone checker|140) [ColocateTableBalancer.matchGroups():903] finished to match colocate group. cost: 0 ms, in lock time: 0 ms
2024-10-10 09:21:00.743+08:00 INFO (ReportHandler|208) [TabletInvertedIndex.tabletReport():306] finished to do tablet diff with backend[10006]. sync: 0. metaDel: 21. foundValid: 37269. foundInvalid: 0. migration: 0. found invalid transactions 0. found republish transactions 0 cost: 73 ms
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229306 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229306 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229306 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229275 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229275 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 WARN (ReportHandler|208) [ReportHandler.deleteFromMeta():835] disk of path hash 2229275 dose not exist, delete tablet 10006 on backend -1 from meta
2024-10-10 09:21:00.743+08:00 INFO (ReportHandler|208) [ReportHandler.deleteFromMeta():944] delete 0 replica(s) from globalStateMgr in db[475686]
2024-10-10 09:21:01.750+08:00 INFO (tablet checker|47) [TabletChecker.doCheck():426] finished to check tablets. isUrgent: true, unhealthy/total/added/in_sched/not_ready: 0/0/0/0/0, cost: 0 ms, in lock time: 0 ms, wait time: 0ms
2024-10-10 09:21:01.787+08:00 INFO (tablet checker|47) [TabletChecker.doCheck():426] finished to check tablets. isUrgent: false, unhealthy/total/added/in_sched/not_ready: 6/37275/0/0/0, cost: 37 ms, in lock time: 36 ms, wait time: 0ms
元数据不一致
com.starrocks.persist.OperationType
public class OperationType {
...
// new operator for partition 10241 ~ 10260
public static final short OP_ADD_PARTITION_V2 = 10241;
public static final short OP_ADD_PARTITIONS_V2 = 10242;
@IgnorableOnReplayFailed
public static final short OP_MODIFY_PARTITION_V2 = 10243;
public static final short OP_ADD_SUB_PARTITIONS_V2 = 10244;
...
}
// com.starrocks.server.GlobalStateMgr
public void replayAddPartition(PartitionPersistInfoV2 info) throws DdlException {
localMetastore.replayAddPartition(info);
}
// com.starrocks.server.LocalMetastore 1478
public void replayAddPartition(PartitionPersistInfoV2 info) throws DdlException {
Database db = this.getDb(info.getDbId());
db.writeLock();
try {
OlapTable olapTable = (OlapTable) db.getTable(info.getTableId());
Partition partition = info.getPartition();
//
PartitionInfo partitionInfo = olapTable.getPartitionInfo();
if (info.isTempPartition()) {
olapTable.addTempPartition(partition);
} else {
olapTable.addPartition(partition);
}
元数据恢复
https://forum.mirrorship.cn/t/topic/12543/8
恢复元数据
fe.conf
文件,配置 metadata_enable_recovery_mode = true
查看元数据恢复进度
SHOW PROC '/meta_recovery';
该语句将显示无法恢复的分区。您可以按照其中返回的建议来恢复这些分区。如果没有返回任何内容,则表示恢复成功。
删除对应物化视图
DROP MATERIALIZED VIEW ads_test.ads_vip_labels_d_f_mv_2;
删除fe.conf 文件中 metadata_enable_recovery_mode = true
中 fe 重启
- 查看物化视图
- 查看任务状态
select * from information_schema.tasks order by CREATE_TIME