阿里云RDS for PostgreSQL用户如何定制数据库参数

背景

为了满足大多数用户的需求,阿里云提供的RDS PG,数据库参数是根据通用性场景设置的。

如果用户不是通用的场景,或者用户有自定义参数的需求怎么办呢?

由于数据库的参数是分级的,层级越高优先级越高,用户可以在高层级设置参数的值,以此来覆盖RDS PG设置的一些参数,达到修改参数值的目的。

PostgreSQL参数分级

  1. 环境变量
  2. 配置文件(postgresql.conf)
  3. postgres 命令行启动参数
  4. 用户、数据库级别参数(alter role set, alter database set 设置的参数)
  5. 会话级参数,影响当前会话
  6. 事务级参数,影响当前事务
  7. 函数级参数 (function option) (SET configuration_parameter { TO value | = value | FROM CURRENT })
  8. 函数内设置的参数 (函数内调用 set 设置的参数)

以上数字越大,优先级越高。

RDS PG用户如何定制参数

前面讲解了参数的分级,接下来教大家如何修改RDS PG的参数值。

为了解决用户自定义参数的问题,RDS SUPERUSER开放了修改用户与数据库级别参数的权限,从而来调整自己定制的参数。

例1
用户可以设置角色级别的 random_page_cost 参数,调整索引 page页的成本。

以rds_superuser角色登陆RDS PG数据库 , 然后执行  

alter role all set random_page_cost=1.3;  

如果反馈需要超级用户才能修改,可能是你的版本没有开放这个参数。 
那么就只能改用户自己的,如果有多个用户需要修改多次。  

connect to role1 
aler role role1 set random_page_cost=1.3;   

connect to role2 
aler role role2 set random_page_cost=1.3;   

这样设置后,新发起的用户会话,会使用这个新的参数值。

重新连接,即可享受 random_page_cost=1.3

例2
调整表级垃圾回收和计算统计信息的阈值(假设表被频繁的更新,插入,想快速的生成统计信息,可以把阈值调小).

以表的owner连接数据库  

alter table test set (autovacuum_analyze_scale_factor = 0.02);  
alter table test set (autovacuum_vacuum_scale_factor = 0.01);  

https://www.postgresql.org/docs/9.5/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS

关于垃圾回收,除了设置以上两个阈值,还有3个非常关键的参数

postgres=# show autovacuum_max_workers ;
 autovacuum_max_workers 
------------------------
 5
(1 row)
同时可以起多少个垃圾回收的进程
  
postgres=# show autovacuum_naptime ;
 autovacuum_naptime 
--------------------
 1min
(1 row)
autovacuum守护进程要做一件事情,轮询每个数据库,并查找出需要被autovacuum垃圾回收的表。  
这个参数决定了轮询完所有的数据库需要多久,也就是说,值越小,轮询越快。  
  
postgres=# show autovacuum_work_mem ;
 autovacuum_work_mem 
---------------------
 -1
(1 row)
每个垃圾回收的进程可以使用多少内存

这三个参数应该如何设置

postgres=# show autovacuum_max_workers ;
如果你的业务会涉及频繁的更新或删除非常多的表,在有足够的CPU核与IOPS能力的情况下这个值越大越好。  
  
postgres=# show autovacuum_naptime ;
如果你的业务,在非常多的数据库中,涉及到非常多的表的频繁更新或删除。  
在有足够的CPU核与IOPS能力的情况下这个值越大越好,这个值越小越好。  
  
postgres=# show autovacuum_work_mem ;
系统必须预留给autovacuum worker足够的内存,大小为 autovacuum_work_mem * autovacuum_work_mem 

如果垃圾回收进程影响了数据库业务的SQL怎么办?
PostgreSQL有简单的autovacuum调度,当垃圾回收消耗了一定的成本后,会SLEEP一段时间,给业务留出系统资源,涉及几个参数。

1. autovacuum_vacuum_cost_limit
当所有autovacuum worker进程的cost和超出这个COST,worker开始休息,COST计算方法由以下参数决定  
vacuum_cost_page_hit (integer)
The estimated cost for vacuuming a buffer found in the shared buffer cache. It represents the cost to lock the buffer pool, lookup the shared hash table and scan the content of the page. The default value is one.

vacuum_cost_page_miss (integer)
The estimated cost for vacuuming a buffer that has to be read from disk. This represents the effort to lock the buffer pool, lookup the shared hash table, read the desired block in from the disk and scan its content. The default value is 10.

vacuum_cost_page_dirty (integer)
The estimated cost charged when vacuum modifies a block that was previously clean. It represents the extra I/O required to flush the dirty block out to disk again. The default value is 20.

2. autovacuum_vacuum_cost_delay
休息多久

如果用户的SQL已经出现了倾斜,或者用户想绑定执行计划怎么办?
可以参考这篇帖子,使用plan hint插件来绑定执行计划
https://yq.aliyun.com/articles/17212

注意,RDS PG并不是开放所有的参数给用户设置,同时因受到PG内核的限制,有些参数只能启动时设置,例如共享内存段的大小,不过大多数和性能或者使用环境相关的参数,都是可以设置的,如果您有新的需求,也欢迎回复或提工单告知。

通过这个源码文件,可以了解参数的一些其他细节
src/include/utils/guc.h

参数允许在什么时候被设置

/*
 * Certain options can only be set at certain times. The rules are
 * like this:
 *
 * INTERNAL options cannot be set by the user at all, but only through
 * internal processes ("server_version" is an example).  These are GUC
 * variables only so they can be shown by SHOW, etc.
 *
 * POSTMASTER options can only be set when the postmaster starts,
 * either from the configuration file or the command line.
 *
 * SIGHUP options can only be set at postmaster startup or by changing
 * the configuration file and sending the HUP signal to the postmaster
 * or a backend process. (Notice that the signal receipt will not be
 * evaluated immediately. The postmaster and the backend check it at a
 * certain point in their main loop. It's safer to wait than to read a
 * file asynchronously.)
 *
 * BACKEND and SU_BACKEND options can only be set at postmaster startup,
 * from the configuration file, or by client request in the connection
 * startup packet (e.g., from libpq's PGOPTIONS variable).  SU_BACKEND
 * options can be set from the startup packet only when the user is a
 * superuser.  Furthermore, an already-started backend will ignore changes
 * to such an option in the configuration file.  The idea is that these
 * options are fixed for a given backend once it's started, but they can
 * vary across backends.
 *
 * SUSET options can be set at postmaster startup, with the SIGHUP
 * mechanism, or from the startup packet or SQL if you're a superuser.
 *
 * USERSET options can be set by anyone any time.
 */
typedef enum
{
        PGC_INTERNAL,  // 只能看的参数
        PGC_POSTMASTER,  // 只能在数据库启动是设置
        PGC_SIGHUP,  //  可以通过修改配置文件,然后给postmaster发SIGHUP信号更新参数值
        PGC_SU_BACKEND,  // 只能在数据库启动是设置,或者在客户端连接数据库时设置,通过libpq提供的PGOPTIONS接口。(只允许超级用户连接时设置)
        PGC_BACKEND,  //  同上,但是允许普通用户设置
        PGC_SUSET,   //  同上,同时允许在会话中设置,必须是超级用户
        PGC_USERSET  // 同上,同时允许在会话中设置,允许普通用户设置
} GucContext;

参数或变量的作用域

/*
 * The following type records the source of the current setting.  A
 * new setting can only take effect if the previous setting had the
 * same or lower level.  (E.g, changing the config file doesn't
 * override the postmaster command line.)  Tracking the source allows us
 * to process sources in any convenient order without affecting results.
 * Sources <= PGC_S_OVERRIDE will set the default used by RESET, as well
 * as the current value.  Note that source == PGC_S_OVERRIDE should be
 * used when setting a PGC_INTERNAL option.
 *
 * PGC_S_INTERACTIVE isn't actually a source value, but is the
 * dividing line between "interactive" and "non-interactive" sources for
 * error reporting purposes.
 *
 * PGC_S_TEST is used when testing values to be used later ("doit" will always
 * be false, so this never gets stored as the actual source of any value).
 * For example, ALTER DATABASE/ROLE tests proposed per-database or per-user
 * defaults this way, and CREATE FUNCTION tests proposed function SET clauses
 * this way.  This is an interactive case, but it needs its own source value
 * because some assign hooks need to make different validity checks in this
 * case.  In particular, references to nonexistent database objects generally
 * shouldn't throw hard errors in this case, at most NOTICEs, since the
 * objects might exist by the time the setting is used for real.
 *
 * NB: see GucSource_Names in guc.c if you change this.
 */
typedef enum
{
        PGC_S_DEFAULT,                          /* hard-wired default ("boot_val") */
        PGC_S_DYNAMIC_DEFAULT,          /* default computed during initialization */
        PGC_S_ENV_VAR,                          /* postmaster environment variable */
        PGC_S_FILE,                                     /* postgresql.conf */
        PGC_S_ARGV,                                     /* postmaster command line */
        PGC_S_GLOBAL,                           /* global in-database setting */
        PGC_S_DATABASE,                         /* per-database setting */
        PGC_S_USER,                                     /* per-user setting */
        PGC_S_DATABASE_USER,            /* per-user-and-database setting */
        PGC_S_CLIENT,                           /* from client connection request */
        PGC_S_OVERRIDE,                         /* special case to forcibly set default */
        PGC_S_INTERACTIVE,                      /* dividing line for error reporting */
        PGC_S_TEST,                                     /* test per-database or per-user setting */
        PGC_S_SESSION                           /* SET command */
} GucSource;

祝大家玩得开心,欢迎随时来 阿里云促膝长谈 业务需求 ,恭候光临。

阿里云的小伙伴们加油,努力做 最贴地气的云数据库

上一篇:如何基于Flink将流式数据实时写入AnalyticDB for PostgreSQL


下一篇:NLP Evaluation Metric 分类问题