Redis版本:Redis 4.0.1
Redis是一个键值对数据库服务器,存储在内存中,如果重启没进行持久化,数据会丢失。所以需要持久化策略RDB和AOF。
涉及Redis源码解析:
https://gitee.com/lidishan/redis-source-code-analysis/blob/master/src/aof.c
AOF与RDB的区别
- AOF保存Redis所执行的
写命令来记录数据库状态,RDB是直接
保存键值对数据
- AOF纯文本,RDB二进制
AOF实现分为三部分:命令追加(append)、文件写入、文件同步(sync)
AOF的文件写入会写到RedisServer#aof_buf上
struct
redisServer
{
....
//
存储
AOF
的缓冲区,用的是
SDS
来存储
sds
aof_buf
;
/* AOF buffer, written before entering the event loop */
....
}
AOF实现三步骤如下:
1 执行完命令,AOF先会追加到aof_buf
2 会有一个时间事件循环器,调用serverCron中flushAppendOnlyFile来讲aof_buf缓存刷出持久化
3 持久化有三种策略flushAppendOnlyFile(appendfsync)如下:
- appendfsync=always:立即同步AOF // 慢,安全,最多只丢失当前事件循环的数据
- appendfsync=everysec:超过一秒后同步 // 较慢,较安全,丢失1秒数据
- appendfsync=no:不做同步,由操作系统决定几时刷出 // 快,不安全,会丢失很多数据
#define
AOF_FSYNC_NO
0
#define
AOF_FSYNC_ALWAYS
1
#define
AOF_FSYNC_EVERYSEC
2
-- 其中
serverCron调用flushAppendOnlyFile执行分析如下
/**
* 刷新aof_buf到aof文件,持久化
*
* 先判断是刷aof_fd到磁盘,还是把aof_buf刷到aof_fd中,然后根据同步策略来进行同步
*/
void flushAppendOnlyFile(int force) {
ssize_t nwritten;
int sync_in_progress = 0;
mstime_t latency;
// 如果aof的buf为空,说明之前执行的指令已经全部刷新到fd
if (sdslen(server.aof_buf) == 0) {
/* Check if we need to do fsync even the aof buffer is empty,
* because previously in AOF_FSYNC_EVERYSEC mode, fsync is
* called only when aof buffer is not empty, so if users
* stop write commands before fsync called in one second,
* the data in page cache cannot be flushed in time. */
// 如果模式是 超过一秒后同步
// && 同步的偏移量不等于当前长度
// && 服务器时间大于最后同步时间
// %% 进程未被阻塞
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
server.aof_fsync_offset != server.aof_current_size &&
server.unixtime > server.aof_last_fsync &&
!(sync_in_progress = aofFsyncInProgress())) {
goto try_fsync;
} else {
return;
}
}
// 执行到这步,说明aof的buf不为空
// (执行命令会先写到aof_buf,然后serverCron执行buf写入aof_fd,然后根据同步策略进行同步),需要刷新出来
if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
// 如果是AOF_FSYNC_EVERYSEC(一秒后同步)的策略,则判断当前aof同步进程是否在执行
sync_in_progress = aofFsyncInProgress();
// 同步进程是否在执行中
if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
/* With this append fsync policy we do background fsyncing.
* If the fsync is still in progress we can try to delay
* the write for a couple of seconds. */
if (sync_in_progress) {
if (server.aof_flush_postponed_start == 0) {
/* No previous write postponing, remember that we are
* postponing the flush and return. */
server.aof_flush_postponed_start = server.unixtime;
return;
} else if (server.unixtime - server.aof_flush_postponed_start < 2) {
/* We were already waiting for fsync to finish, but for less
* than two seconds this is still ok. Postpone again. */
return;
}
/* Otherwise fall trough, and go write since we can't wait
* over two seconds. */
server.aof_delayed_fsync++;
serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
}
}
/* We want to perform a single write. This should be guaranteed atomic
* at least if the filesystem we are writing is a real physical one.
* While this will save us against the server being killed I don't think
* there is much to do about the whole server stopping for power problems
* or alike */
// 如果休眠属性不为空,并且有buf,就关掉休眠
if (server.aof_flush_sleep && sdslen(server.aof_buf)) {
usleep(server.aof_flush_sleep);
}
latencyStartMonitor(latency);
// 将aof_buf数据写入到aof_fd
nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
latencyEndMonitor(latency);
.......这里是一堆监控、属性设置、错误日志处理,就不展示了.......
try_fsync:
/* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
* children doing I/O in the background. */
// 如果当前一个aof重写进程再跑,就不执行
if (server.aof_no_fsync_on_rewrite && hasActiveChildProcess())
return;
/* Perform the fsync if needed. */
// aof同步模式为立即同步的情况下
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
/* redis_fsync is defined as fdatasync() for Linux in order to avoid
* flushing metadata. */
latencyStartMonitor(latency);
/* Let's try to get this data on the disk. To guarantee data safe when
* the AOF fsync policy is 'always', we should exit if failed to fsync
* AOF (see comment next to the exit(1) after write error above). */
// 尝试把数据加载到磁盘上,保证数据模式是AOF_FSYNC_ALWAYS,失败就退出
if (redis_fsync(server.aof_fd) == -1) {
serverLog(LL_WARNING,"Can't persist AOF for fsync error when the "
"AOF fsync policy is 'always': %s. Exiting...", strerror(errno));
exit(1);
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("aof-fsync-always",latency);
// 设置 AOF_FSYNC_ALWAYS 加载进磁盘成功后的参数
server.aof_fsync_offset = server.aof_current_size;
// aof当前长度
server.aof_last_fsync = server.unixtime;
// 最后一次执行aof的时间
} else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
server.unixtime > server.aof_last_fsync)) {
// 如果模式是每隔一秒同步 AOF_FSYNC_EVERYSEC
if (!sync_in_progress) {
// 没有同步进程在执行
aof_background_fsync(server.aof_fd);
// 开启一个job,把aof异步写入磁盘
server.aof_fsync_offset = server.aof_current_size;
// 设置偏移量
}
server.aof_last_fsync = server.unixtime;
}
}
AOF重写
上面aof文件的过程是:命令执行 -> append到aof_buf -> serverCron将aof_buf同步到aof_fd,然后根据三种刷盘策略刷盘
上面看起来没啥问题。但是如果执行的指令太多了,
用aof进行恢复的时候就需要一条条执行,效率太差,所以就有了如下优化
- 优化方式是将
多个命令合并成一个批量执行命令(元素过多会拆分成多个批量命令,默认阈值=64)
-- 单变多 :RPUSH "A"
RPUSH "B" => RPUSH "A" "B" "C"
RPUSH "c"
AOF重写的过程中还会有新的指令执行,这时候怎么协调AOF重写和新指令?
- AOF重写的时候,会有一个aof_rewrite_buf 和 原来的aof_buf,并且触发写命令时,会分别写入到aof_rewrite_buf(aof重写缓冲区)和aof_buf(aof缓冲区)
上面如图,就可以保证原来的aof不受影响,等待新的AOF文件重写完,就把新的AOF文件命名覆盖旧的AOF文件,然后删除旧AOF文件。