kafka 版本:1.1.1
一个分区对应一个文件夹,数据以 segment 文件存储,segment 默认 1G。
分区文件夹:
segment 文件:
segment 的命名规则是怎样的?
kafka roll segment 的逻辑:kafka.log.Log#roll
/** * Roll the log over to a new active segment starting with the current logEndOffset. * This will trim the index to the exact size of the number of entries it currently contains. * * @return The newly rolled segment */ def roll(expectedNextOffset: Option[Long] = None): LogSegment = { maybeHandleIOException(s"Error while rolling log segment for $topicPartition in dir ${dir.getParent}") { val start = time.hiResClockMs() lock synchronized { checkIfMemoryMappedBufferClosed() val newOffset = math.max(expectedNextOffset.getOrElse(0L), logEndOffset) // 00000000000030898257.log 文件 val logFile = Log.logFile(dir, newOffset) if (segments.containsKey(newOffset)) { // segment with the same base offset already exists and loaded if (activeSegment.baseOffset == newOffset && activeSegment.size == 0) { // We have seen this happen (see KAFKA-6388) after shouldRoll() returns true for an // active segment of size zero because of one of the indexes is "full" (due to _maxEntries == 0). warn(s"Trying to roll a new log segment with start offset $newOffset " + s"=max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already " + s"exists and is active with size 0. Size of time index: ${activeSegment.timeIndex.entries}," + s" size of offset index: ${activeSegment.offsetIndex.entries}.") deleteSegment(activeSegment) } else { throw new KafkaException(s"Trying to roll a new log segment for topic partition $topicPartition with start offset $newOffset" + s" =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already exists. Existing " + s"segment is ${segments.get(newOffset)}.") } } else if (!segments.isEmpty && newOffset < activeSegment.baseOffset) { throw new KafkaException( s"Trying to roll a new log segment for topic partition $topicPartition with " + s"start offset $newOffset =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) lower than start offset of the active segment $activeSegment") } else { val offsetIdxFile = offsetIndexFile(dir, newOffset) val timeIdxFile = timeIndexFile(dir, newOffset) val txnIdxFile = transactionIndexFile(dir, newOffset) for (file <- List(logFile, offsetIdxFile, timeIdxFile, txnIdxFile) if file.exists) { warn(s"Newly rolled segment file ${file.getAbsolutePath} already exists; deleting it first") Files.delete(file.toPath) } Option(segments.lastEntry).foreach(_.getValue.onBecomeInactiveSegment()) } // take a snapshot of the producer state to facilitate recovery. It is useful to have the snapshot // offset align with the new segment offset since this ensures we can recover the segment by beginning // with the corresponding snapshot file and scanning the segment data. Because the segment base offset // may actually be ahead of the current producer state end offset (which corresponds to the log end offset), // we manually override the state offset here prior to taking the snapshot. producerStateManager.updateMapEndOffset(newOffset) producerStateManager.takeSnapshot() val segment = LogSegment.open(dir, baseOffset = newOffset, config, time = time, fileAlreadyExists = false, initFileSize = initFileSize, preallocate = config.preallocate) addSegment(segment) // We need to update the segment base offset and append position data of the metadata when log rolls. // The next offset should not change. updateLogEndOffset(nextOffsetMetadata.messageOffset) // schedule an asynchronous flush of the old segment scheduler.schedule("flush-log", () => flush(newOffset), delay = 0L) info(s"Rolled new log segment at offset $newOffset in ${time.hiResClockMs() - start} ms.") segment } } }
可以看到,segment 使用当前 logEndOffset 作为文件名。即 segment 文件用第一条消息的 offset 作文件名。
还有一个和 log 文件同名的 index 文件,index 文件内容是 offset/position,一个 entry 包含 2 个 int,一共 8 字节。
kafka.log.OffsetIndex#append
/** * Append an entry for the given offset/location pair to the index. This entry must have a larger offset than all subsequent entries. */ def append(offset: Long, position: Int) { inLock(lock) { require(!isFull, "Attempt to append to a full index (size = " + _entries + ").") if (_entries == 0 || offset > _lastOffset) { trace(s"Adding index entry $offset => $position to ${file.getAbsolutePath}") // 相对偏移量 mmap.putInt((offset - baseOffset).toInt) // 消息在 log 文件中的物理地址 mmap.putInt(position) _entries += 1 _lastOffset = offset require(_entries * entrySize == mmap.position(), entries + " entries but file position in index is " + mmap.position() + ".") } else { throw new InvalidOffsetException(s"Attempt to append an offset ($offset) to position $entries no larger than" + s" the last offset appended (${_lastOffset}) to ${file.getAbsolutePath}.") } } }
盗图一张:
http://rocketmq.cloud/zh-cn/docs/design-store.html
而 rocketMQ 的存储与 kafka 不同,分为 commitlog 和 consumequeue:
所有 topic 的消息存储在 commitlog 文件中,commitlog 默认按 1G 分段,文件名按物理偏移量命名。
而索引信息保存在 consumequeue/topic/queue 目录下,一个 entry 固定 20 字节,分别为 8 字节的 commitlog 物理偏移量、4 字节的消息长度、8 字节tag hashcode。
不知道是稀疏的,还是一条消息对应一个 entry。消息会分开存储到 2 个文件中吗? 后面再看下