为什么pg_basebackup或pg_start_backup好像hang住确没有开始拷贝文件 - checkpoint 的几种调度(checkpoint_completion_target)

标签

PostgreSQL , checkpoint , 调度 , lazy , immediate , pg_start_backup , pg_basebackup


背景

PostgreSQL支持在线全量备份与增量归档备份。在线全量备份实际上就是拷贝文件,增量备份则分为两种,一种是基于BLOCK lsn变化的BLOCK即增量备份,另一种是基于WAL的持续归档文件备份。

全量备份通常使用pg_basebackup客户端实现,或者使用SQL函数pg_start_backup()+COPY文件、打快照的方式实现。

全量备份开启前,需要对数据库做一次checkpoint,并强制开启full page write,确保partial block在后续可以通过wal进行恢复。备份结束时通过pg_stop_backup告知,关闭full page write(如果参数开启了FPW则不受影响)。

有时你可能会发现使用pg_basebackup或pg_start_backup接口时,好像hang住确没有开始拷贝文件。实际上是在做checkpoint,但是为什么这个checkpoint比较慢,但是直接SQL执行checkopint命令确很快呢?

原因是checkpoint分为调度和非调度模式。

调度模式的checkpoint和checkpoint_completion_target以及配置的max_wal_size区间大小有关,checkpoint_completion_target和max_wal_size越大,表示这个checkpoint将在这么大的区间内调度完成,所以总耗时会非常长,好处是减少CHECKPOINT带来的大量刷脏和FSYNC,从而减少抖动。

坏处就是你会发现checkpoint很漫长。

非调度模式的checkpoint,就是尽快完成检查点,会全速刷脏,不进行调度。好处是快,坏处是,如果脏页特别多,可能会有大量IO影响其他会话性能。

postgres=# show max_wal_size ;  
 max_wal_size   
--------------  
 128GB  
(1 row)  
  
postgres=# show min_wal_size;  
 min_wal_size   
--------------  
 32GB  
(1 row)  
             
postgres=# show checkpoint_completion_target ;  
 checkpoint_completion_target   
------------------------------  
 0.1  
(1 row)  

代码中可以看到,checkpoint有如下flag来控制检查点行为。

 * RequestCheckpoint  
 *              Called in backend processes to request a checkpoint  
 *  
 * flags is a bitwise OR of the following:  
 *      CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.  
 *      CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.  
 *      CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,  
 *              ignoring checkpoint_completion_target parameter.  
 *      CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred  
 *              since the last one (implied by CHECKPOINT_IS_SHUTDOWN or  
 *              CHECKPOINT_END_OF_RECOVERY).  
 *      CHECKPOINT_WAIT: wait for completion before returning (otherwise,  
 *              just signal checkpointer to do it, and return).  
 *      CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.  
 *              (This affects logging, and in particular enables CheckPointWarning.)  
 */  
void  
RequestCheckpoint(int flags)  

start backup如何控制是使用快速checkpoint(非调度模式)、或者调度模式的checkpoint呢?

XLogRecPtr  
do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,  
                                   StringInfo labelfile, DIR *tblspcdir, List **tablespaces,  
                                   StringInfo tblspcmapfile, bool infotbssize,  
                                   bool needtblspcmapfile)  
{  
  
  
  
                         * Since the fact that we are executing do_pg_start_backup()  
                         * during recovery means that checkpointer is running, we can use  
                         * RequestCheckpoint() to establish a restartpoint.  
                         *  
                         * We use CHECKPOINT_IMMEDIATE only if requested by user (via  
                         * passing fast = true).  Otherwise this can take awhile.  
                         */  
                        RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |  
                                                          (fast ? CHECKPOINT_IMMEDIATE : 0));  

1、pg_basebackup客户端命令,通过-c参数控制(fast表示使用非调度模式checkpoint)

  -c, --checkpoint=fast|spread  
                         set fast or spread checkpointing  

2、pg_start_backup SQL函数,通过参数fast控制

postgres=# \df pg_start_backup  
                                                        List of functions  
   Schema   |      Name       | Result data type |                          Argument data types                           | Type   
------------+-----------------+------------------+------------------------------------------------------------------------+------  
 pg_catalog | pg_start_backup | pg_lsn           | label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true | func  
(1 row)  

小结

如果你需要快速的开始备份,可以使用fast(非调度模式)参数。

 

 

免费领取阿里云RDS PostgreSQL实例、ECS虚拟机

为什么pg_basebackup或pg_start_backup好像hang住确没有开始拷贝文件 - checkpoint 的几种调度(checkpoint_completion_target)

上一篇:GDI+中发生一般性错误 以及发布时候需要配置的文件


下一篇:ezjailserver备份和恢复方法