在了解之前,大家了解BGWriter的作用吗?相信一定有人会说,是写脏buffer到磁盘,其实,这是不准确的,实际上是把脏buffer写到OS cache,然后由系统从OS cache flush到磁盘。
PG是高度依赖系统缓存的,如下图所示:
无论何时发出查询操作,我们的数据实际上是先载入到OS cache,然后再到shared buffer中,同样的,当需要把脏数据刷到磁盘时,也是先刷到OS cache,然后再调用 fsync()从OS cache刷新到磁盘。
这可能导致空间浪费,因为可能在shared_buffer中和OS cache中有同样的数据,但是OS cache是LRU淘汰算法,shared_buffer是clock sweep淘汰算法,如果数据库受到冲击,永远不会到OS cache中取数据,因为OS cache很容易被删除。
可以设置bgwriter_flush_after参数,默认512KB
后台写进程在超过这个设置的时候,强制将数据写到底层存储,这样就限制了内核缓存页的脏数据,减少了检查点末尾调用fsync或者OS 在后台大批量回写数据时卡住的可能性。
不仅仅bgwriter,在pg中检查点进程和后台进程也会从shared_buffer中写数据到OS cache,我们也可以分别通过checkpoint_flush_after和backend_flush_afte这两个参数影响操作系统的 fsync。
shared_buffer值一般设置为内存的1/4,当然每个数据库环境不一样,可以根据自己的环境动态调整。
另外如果需要查看shared_buffer中的内容,可以安装插件pg_buffercache
SELECT c.relname
, pg_size_pretty(count(*) * 8192) as buffered
, round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
, round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE pg_relation_size(c.oid) > 0
GROUP BY c.oid, c.relname
ORDER BY 3 DESC
LIMIT 10;
如果要查看OS cache中的数据,需要安装pgfincore插件
As root user:
export PATH=/usr/local/pgsql/bin:$PATH //Set the path to point pg_config.
tar -xvf pgfincore-v1.1.1.tar.gz
cd pgfincore-1.1.1
make clean
make
make install
Now connect to PG and run below command
postgres=# CREATE EXTENSION pgfincore;
可通过以下语句查看
select c.relname,pg_size_pretty(count(*) * 8192) as pg_buffered,
round(100.0 * count(*) /
(select setting
from pg_settings
where name='shared_buffers')::integer,1)
as pgbuffer_percent,
round(100.0*count(*)*8192 / pg_table_size(c.oid),1) as percent_of_relation,
( select round( sum(pages_mem) * 4 /1024,0 )
from pgfincore(c.relname::text) )
as os_cache_MB ,
round(100 * (
select sum(pages_mem)*4096
from pgfincore(c.relname::text) )/ pg_table_size(c.oid),1)
as os_cache_percent_of_relation,
pg_size_pretty(pg_table_size(c.oid)) as rel_size
from pg_class c
inner join pg_buffercache b on b.relfilenode=c.relfilenode
inner join pg_database d on (b.reldatabase=d.oid and d.datname=current_database()
and c.relnamespace=(select oid from pg_namespace where nspname='public'))
group by c.oid,c.relname
order by 3 desc limit 30;
为什么亚马逊的postgresql RDS设置了75%,因为没有使用双缓存,没有使用OS cache,所以就 会加大shared buffer来提高性能。
参考:
https://postgreshelp.com/postgresql_shared_buffers/