HugePages 大内存页
HugePage,就是指的大页内存管理方式。与传统的4kb的普通页管理方式相比,HugePage为管理大内存(8GB以上)更为高效。本文描述了什么是HugePage,以及HugePage的一些特性。
1、Hugepage的引入
操作系统对于数据的存取直接从物理内存要比从磁盘读写数据要快的多,但是物理内存是有限的,这样就引出了物理内存与虚拟内存的概念。虚拟内存就是为了满足物理内存的不足而提出的策略,它是利用磁盘空间虚拟出的一块逻辑内存,这部分磁盘空间Windows下称之为虚拟内存,Linux下被称为交换空间(Swap Space)。
对于这个大内存的管理(物理内存+虚拟内存),大多数操作系统采用了分段或分页的方式进行管理。分段是粗粒度的管理方式,而分页则是细粒度管理方式,分页方式可以避免内存空间的浪费。相应地,也就存在内存的物理地址与虚拟地址的概念。通过前面这两种方式,CPU必须把虚拟地址转换程物理内存地址才能真正访问内存。为了提高这个转换效率,CPU会缓存最近的虚拟内存地址和物理内存地址的映射关系,并保存在一个由CPU维护的映射表中。为了尽量提高内存的访问速度,需要在映射表中保存尽量多的映射关系。
linux的内存管理采取的是分页存取机制,为了保证物理内存能得到充分的利用,内核会按照LRU算法在适当的时候将物理内存中不经常使用的内存页自动交换到虚拟内存中,而将经常使用的信息保留到物理内存。通常情况下,Linux默认情况下每页是4K,这就意味着如果物理内存很大,则映射表的条目将会非常多,会影响CPU的检索效率。因为内存大小是固定的,为了减少映射表的条目,可采取的办法只有增加页的尺寸。因此Hugepage便因此而来。也就是打破传统的小页面的内存管理方式,使用大页面2m,4m,16m等等。如此一来映射条目则明显减少。如果系统有大量的物理内存(大于8G),则物理32位的操作系统还是64位的,都应该使用Hugepage。
2、Hugepage的相关术语
Page Table:
A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
如前所述,page table也就是一种用于内存管理的实现方式,用于物理地址到虚拟之间的映射。因此对于内存的访问,先是访问Page Table,然后根据Page Table 中的映射关系,隐式的转移到物理地址来存取数据。
TLB:
A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
CPU中的一块固定大小的cache,包含了部分page table的映射关系,用于快速实现虚拟地址到物理地址的转换。
hugetlb:
This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
hugetlb 是TLB中指向HugePage的一个entry(通常大于4k或预定义页面大小)。 HugePage 通过hugetlb entries来实现,也可以理解为HugePage 是hugetlb page entry的一个句柄。
hugetlbfs:
This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.
一个类似于tmpfs的新的in-memory filesystem,在2.6内核被提出。
3、常见的错误概念
WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems
RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations
WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS
RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too.
WRONG: hugetlbfs means hugetlb
RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs
WRONG: hugetlbfs means hugepages
RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs.
4、Regular Pages 与 HugePages
a、Regular Pages
在下图中有两个不同的进程,两个进程对于内存的访问是首先访问本地的page table,而本地的page table又参照了system-wide table的page(也就是前面描述的TLB),最终system-wide table中的entry指向了实际的物理地址。图中物理地址page size大小4kb。也可以看到进程1和进程2在system-wide table中都指向了page2,也就是同一个物理地址。Oracle sga*享内存的使用会出现上述情形。
b、Huge Pages
在下图中,本地的page table 与system page table中都包含了huge page属性。因此page table中的任意一个page可能使用了常规的page,
也有可能使用了huge page。同样进程1和进程2都共享了其中的Hpage2。图中的物理内存常规的page size是4kb,huge page size 是4mb。
--Author : Robinson
--Blog : http://blog.csdn.net/robinson_0612
5、huge page 的大小
huge page 的大小取决于所使用的操作系统的内核版本以及不同的硬件平台
可以使用$grep Hugepagesize /proc/meminfo来查看huge page 的大小
下面是不同平台常用的huge page 的大小。
HW Platform Source Code Tree Kernel 2.4 Kernel 2.6
----------------- --------------------- ------------ -------------
Linux x86 (IA32) i386 4 MB 4 MB *
Linux x86-64 (AMD64, EM64T) x86_64 2 MB 2 MB
Linux Itanium (IA64) ia64 256 MB 256 MB
IBM Power Based Linux (PPC64) ppc64/powerpc N/A ** 16 MB
IBM zSeries Based Linux s390 N/A 1 MB
IBM S/390 Based Linux s390 N/A N/A
6、使用huge page的优点
对于较大的系统内存以及sga,使用hugepage可以极大程度的提高Oracle数据库性能。
a、Not swappable
HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.
无需交换。也就是不存在页面由于内存空间不足而存在换入换出的问题
b、Relief of TLB pressure
Hugepge uses fewer pages to cover the physical address space, so the size of “book keeping” (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB
TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA
Fewer TLB entries for the SGA also means more for other parts of the address space
减轻TLB的压力,也就是降低了cpu cache可缓存的地址映射压力。由于使用了huge page,相同的内存大小情况下,管理的虚拟地址数量变少。
TLB entry可以包含更多的地址空间,cpu的寻址能力相应的得到了增强。
c、Decreased page table overhead
Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also Document 361468.1.
降低page table负载,对于普通的page,每个entry需要64bytes进行管理,对于50gb的内存,管理这些entry,需要800mb的大小
(50*1024*1024)kb/4kb*64bytes/1024/1024=800mb。
d、Eliminated page table lookup overhead
Since the pages are not subject to replacement, page table lookups are not required.( 消除page table查找负载)
e、Faster overall memory performance
On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.(提高内存的整体性能)
7、未正确配值huge page的风险
基于大内存(>8GB)的管理,如果配值或正确配值huge page,可能存在下列不确定的隐性问题
HugePages not used (HugePages_Total = HugePages_Free) at all wasting the amount configured for
Poor database performance
System running out of memory or excessive swapping
Some or any database instance cannot be started
Crucial system services failing (e.g.: CRS)
8、基于2.6内核的配值步骤
The kernel parameter used for HugePages is vm.nr_hugepages which is based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are examples of distributions with the 2.6 kernel. For the configuration, follow steps below:
a. Start instance(s)
b. Calculate nr_hugepages using script from Document 401749.1
c. Set kernel parameter:
# sysctl -w vm.nr_hugepages=
and make sure that the parameter is persistent to reboots. e.g. On SLES9:
# chkconfig boot.sysctl on
d. Check available hugepages:
$ grep Huge /proc/meminfo
e. Restart instances
f. Check available hugepages:
$ grep Huge /proc/meminfo
9、注意事项
a、HugePage使用的是共享内存,在操作系统启动期间被动态分配并被保留,因为他们不会被置换。
b、由于不会被置换的特点,在使用hugepage的内存不能被其他的进程使用。所以要合理设置该值,避免造成内存浪费。
c、对于只使用Oracle的服务器来说,把Hugepage设置成SGA(所有instance SGA之和)大小即可。
d、如果增加HugePage或添加物理内存或者是当前服务器增加了新的instance以及SGA发生变化,应该重新设置所需的HugePage。
e、reference: HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1] To Bottom
f、如何配置HugePage,请参考:Linux 下配置 HugePages
Linux 下配置 HugePages http://blog.csdn.net/leshami/article/details/8788825
HugePages是通过使用大页内存来取代传统的4kb内存页面,使得管理虚拟地址数变少,加快了从虚拟地址到物理地址的映射以及通过摒弃内存页面的换入换出以提高内存的整体性能。尤其是对于8GB以上的内存以及较大的Oracle SGA size,建议配值并使用HugePage特性。本文基于x86_64 Linux下来描述如何配值 HugePages。
有关HugePages的特性请参考:Linux HugePage 特性
1、为什么需要配值HugePages ?
a、Larger Page Size and Less # of Pages:
Default page size is 4K whereas the HugeTLB size is 2048K. That means the system would need to handle 512 times less pages.
b、No Page Table Lookups:
Since the HugePages are not subject to replacement (despite regular pages), page table lookups are not required.
c、Better Overall Memory Performance:
On virtual memory systems (any modern OS) each memory operation is actually two abstract memory operations. With HugePages, since there are less number of pages to work on, the possible bottleneck on page table access is clearly avoided.
d、No Swapping:
We must avoid swapping to happen on Linux OS at all Document 1295478.1. HugePages are not swappable (whereas regular pages are). Therefore there is no page replacement mechanism overhead. HugePages are universally regarded as pinned.
e、No 'kswapd' Operations:
kswapd will get very busy if there is a very large area to be paged (i.e. 13 million page table entries for 50GB memory) and will use an incredible amount of CPU resource. When HugePages are used, kswapd is not involved in managing them. See also Document 361670.1
2、配值HugePages
下面列出了配值HugePages的所有步骤
a、查看当前系统是否配值HugePages
下面的查询中HugePages相关的几个值都为0,表明当前未配值HugePages,其次可以看到Hugepagesize为2MB。
$ grep Huge /proc/meminfo
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
b、修改用户的memlock限制
通过修改/etc/security/limits.conf 配值文件来实现
该参数的值通常配值位略小于当前的已安装系统内存,如当前你的系统内存为64GB,可以做如下设置
* soft memlock 60397977
* hard memlock 60397977
上述的设置单位为kb,不会降低系统性能。至少也要配值为略大于系统上所有SGA的总和。
使用ulimit -l 来校验该设置
c、禁用AMM(Oracle 11g)
如果当前的Oracle 版本为10g,可以跳过此步骤。
如果当前的Oracle 版本为11g,由于AMM(Automatic Memory Management)特性与Hugepages不兼容,需要禁用AMM。
ALTER SYSTEM RESET memory_target SCOPE=SPFILE;
ALTER SYSTEM RESET memory_max_target SCOPE=SPFILE;
ALTER SYSTEM SET sga_target=g SCOPE=SPFILE;
ALTER SYSTEM SET pga_aggregate_target=g SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;
d、计算vm.nr_hugepages 的值
使用Oracle 提供的脚本hugepages_settings.sh的脚本来计算vm.nr_hugepages的值
在执行脚本之前确保所有的Oracle 实例已启动以及ASM也启动(存在的情形下)
$ ./hugepages_settings.sh
...
Recommended setting: vm.nr_hugepages = 1496
e、 编辑/etc/sysctl.conf 来设置vm.nr_hugepages参数
$ sysctl -w vm.nr_hugepages = 1496
$ sysctl -p
-- Author : Robinson
-- Blog : http://blog.csdn.net/robinson_0612
f、停止所有的Instance并重启server
上述的所有步骤已经实现了动态修改,但对于HugePages的分配需要重新启动server才能生效。
h、验证配值
HugePages相关参数的值会随着当前服务器上的实例的停止与启动而动态发生变化
通常情况下,HugePages_Free的值应当小于HugePages_Total的值,在HugePages被使用时HugePages_Rsvd值应当为非零值。
$ grep Huge /proc/meminfo
HugePages_Total: 131
HugePages_Free: 20
HugePages_Rsvd: 20
Hugepagesize: 2048 kB
如下面的情形,当服务器上仅有的一个实例被关闭后,HugePages_Rsvd的值为零。且HugePages_Free等于HugePages_Total
$ grep Huge /proc/meminfo
HugePages_Total: 131
HugePages_Free: 131
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
3、使用HugePages的注意事项
下面的三种情形应当重新配置HugePages
a、物理内存的增减或减少
b、在当前服务器上新增或移出Instance
c、Instance的SGA大小增加或减少
如果未能调整HugePages,可能会引发下面的问题
a、数据库性能地下
b、出现内存不足或者过度使用交换空间
c、数据库实例不能被启动
d、关键性系统服务故障
4、HugePages特性的常见故障处理
Symptom A:
System is running out of memory or swapping
Possible Cause:
Not enough HugePages to cover the SGA(s) and therefore the area reserved for HugePages are wasted where SGAs are allocated through regular pages.
Troubleshooting Action:
Review your HugePages configuration to make sure that all SGA(s) are covered.
Symptom B:
Databases fail to start
Possible Cause:
memlock limits are not set properly
Troubleshooting Action:
Make sure the settings in limits.conf apply to database owner account.
Symptom C:
One of the database fail to start while another is up
Possible Cause:
The SGA of the specific database could not find available HugePages and remaining RAM is not enough.
Troubleshooting Action:
Make sure that the RAM and HugePages are enough to cover all your database SGAs
Symptom D:
Cluster Ready Services (CRS) fail to start
Possible Cause:
HugePages configured too large (maybe larger than installed RAM)
Troubleshooting Action:
Make sure the total SGA is less than the installed RAM and re-calculate HugePages.
Symptom E:
HugePages_Total = HugePages_Free
Possible Cause:
HugePages are not used at all. No database instances are up or using AMM.
Troubleshooting Action:
Disable AMM and make sure that the database instances are up.
Symptom F:
Database started successfully and the performance is slow
Possible Cause:
The SGA of the specific database could not find available HugePages and therefore the SGA is handled by regular pages, which leads to slow performance
Troubleshooting Action:
Make sure that the HugePages are many enough to cover all your database SGAs
Reference: [ID 361468.1]
5、计算vm.nr_hugepages 值的脚本
[python] view plain copy print ?
- #!/bin/bash
- #
- # hugepages_settings.sh
- #
- # Linux bash script to compute values for the
- # recommended HugePages/HugeTLB configuration
- #
- # Note: This script does calculation for all shared memory
- # segments available when the script is run, no matter it
- # is an Oracle RDBMS shared memory segment or not.
- #
- # This script is provided by Doc ID 401749.1 from My Oracle Support
- # http://support.oracle.com
- # Welcome text
- echo "
- This script is provided by Doc ID 401749.1 from My Oracle Support
- (http://support.oracle.com) where it is intended to compute values for
- the recommended HugePages/HugeTLB configuration for the current shared
- memory segments. Before proceeding with the execution please note following:
- * For ASM instance, it needs to configure ASMM instead of AMM.
- * The 'pga_aggregate_target' is outside the SGA and
- you should accommodate this while calculating SGA size.
- * In case you changes the DB SGA size,
- as the new SGA will not fit in the previous HugePages configuration,
- it had better disable the whole HugePages,
- start the DB with new SGA size and run the script again.
- And make sure that:
- * Oracle Database instance(s) are up and running
- * Oracle Database 11g Automatic Memory Management (AMM) is not setup
- (See Doc ID 749851.1)
- * The shared memory segments can be listed by command:
- # ipcs -m
- Press Enter to proceed..."
- read
- # Check for the kernel version
- KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`
- # Find out the HugePage size
- HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`
- if [ -z "$HPG_SZ" ];then
- echo "The hugepages may not be supported in the system where the script is being executed."
- exit 1
- fi
- # Initialize the counter
- NUM_PG=0
- # Cumulative number of pages required to handle the running shared memory segments
- for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`
- do
- MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
- if [ $MIN_PG -gt 0 ]; then
- NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
- fi
- done
- RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`
- # An SGA less than 100MB does not make sense
- # Bail out if that is the case
- if [ $RES_BYTES -lt 100000000 ]; then
- echo "***********"
- echo "** ERROR **"
- echo "***********"
- echo "Sorry! There are not enough total of shared memory segments allocated for
- HugePages configuration. HugePages can only be used for shared memory segments
- that you can list by command:
- # ipcs -m
- of a size that can match an Oracle Database SGA. Please make sure that:
- * Oracle Database instance is up and running
- * Oracle Database 11g Automatic Memory Management (AMM) is not configured"
- exit 1
- fi
- # Finish with results
- case $KERN in
- '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
- echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
- '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
- *) echo "Unrecognized kernel version $KERN. Exiting." ;;
- esac
- # End
HugePages on Linux: What It Is... and What It Is Not... (文档 ID 361323.1)
In this Document
Purpose |
Scope |
Details |
Introduction |
Common Misconceptions |
Regular Pages and HugePages |
HugePages in 2.4 Kernels |
Some HugePages Facts/Features |
Advantages of HugePages Over Normal Sharing Or AMM (see below) |
The Size of a HugePage |
HugePages Reservation |
HugePages and Oracle 11g Automatic Memory Management (AMM) |
What if Not Enough HugePages Configured? |
What if Too Much HugePages Configured? |
Parameters/Setup |
Notes on HugePages in General |
References |
APPLIES TO:
Oracle Database - Enterprise Edition
Linux OS - Version Enterprise Linux 3.0 to Oracle Linux 6.5 with Unbreakable Enterprise Kernel [3.8.13] [Release RHEL3 to OL6U5]
IBM S/390 Based Linux (31-bit)
IBM: Linux on POWER Big Endian Systems
Linux x86-64
Linux Itanium
Linux x86
IBM: Linux on System z
***Checked for relevance on 20-May-2013***
PURPOSE
This document describes the HugePages feature in the Linux kernel available for 32-bit and 64-bit architectures. There has been some confusion among the terms and uses related to HugePages. This document should clarify the misconceptions about the feature.
SCOPE
Information in this document is useful for Linux system administrators and Oracle database administrators working with system administrators.
This document covers information about HugePages concept that applies to very large memory (VLM) (>= 4GB) systems for 32-bit and 64-bit architectures including some configuration information and references.
DETAILS
Introduction
HugePages is a feature integrated into the Linux kernel with release 2.6. This feature basically provides the alternative to the 4K page size (16K for IA64) providing bigger pages.
Regarding the HugePages, there are some other similar terms that are being used like, hugetlb, hugetlbfs. Before proceeding into the details of HugePages, see the definitions below:
- Page Table: A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
- TLB: A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
- hugetlb: This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
- hugetlbfs: This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.
Common Misconceptions
WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems | RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations |
WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS | RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too. |
WRONG: hugetlbfs means hugetlb | RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs |
WRONG: hugetlbfs means hugepages | RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs. |
Regular Pages and HugePages
This section aims to give a general picture about memory access in virtual memory systems and how pages are referenced.
When a single process works with a piece of memory, the pages that the process uses are reference in a local page table for the specific process. The entries in this table also contain references to the System-Wide Page Table which actually has references to actual physical memory addresses. So theoretically a user mode process (i.e. Oracle processes), follows its local page table to access to the system page table and then can reference the actual physical table virtually. As you can see below, it is also possible (and very common to Oracle RDBMS due to SGA use) that two different O/S processes can point to the same entry in the system-wide page table.
When HugePages are in the play, the usual page tables are employed. The very basic difference is that the entries in both process page table and the system page table has attributes about huge pages. So any page in a page table can be a huge page or a regular page.
HugePages in 2.4 Kernels
The HugePages feature is backported to some 2.4 kernels. Kernel versions 2.4.21-* has this feature (See Note 311504.1 for the distributions with 2.4.21 kernels) but it is implemented in a different way. The feature is completely available. The difference from 2.6 implementation is the organization within the source code and the kernel parameters that are used for configuring HugePages. See Parameters/Setup section below.
Some HugePages Facts/Features
- HugePages can be allocated on-the-fly but they must be reserved during system startup. Otherwise the allocation might fail as the memory is already paged in 4K mostly.
- HugePage sizes vary from 2MB to 256MB based on kernel version and HW architecture (See related section below.)
- HugePages are not subject to reservation / release after the system startup unless there is system administrator intervention, basically changing the hugepages configuration (i.e. number of pages available or pool size)
Advantages of HugePages Over Normal Sharing Or AMM (see below)
- Not swappable: HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.
- Relief of TLB pressure:
- Hugepge uses fewer pages to cover the physical address space, so the size of “book keeping” (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB
- TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA
- Fewer TLB entries for the SGA also means more for other parts of the address space
- Decreased page table overhead: Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also Document 361468.1.
- Eliminated page table lookup overhead: Since the pages are not subject to replacement, page table lookups are not required.
- Faster overall memory performance: On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.
The Size of a HugePage
The size of a single HugePage varies according to:
- Kernel version/linux distribution
- HW Platform
The actual size of the HugePage on a specific system can be checked by:
$ grep Hugepagesize /proc/meminfo
The table below shows the sizes of HugePages on different configurations. Note that these are general numbers taken from the most recent versions of the kernels. For a specific kernel source package, you can check for the HPAGE_SIZE macro value (based on HPAGE_SHIFT) for a different (more recent) kernel source tree.
HW Platform | Source Code Tree | Kernel 2.4 |
Kernel 2.6 and later |
Linux x86 (IA32) | i386 | 4 MB | 2 MB |
Linux x86-64 (AMD64, EM64T) | x86_64 | 2 MB | 2 MB |
Linux Itanium (IA64) | ia64 | 256 MB | 256 MB |
IBM Power Based Linux (PPC64) | ppc64/powerpc | N/A ** | 16 MB |
IBM zSeries Based Linux | s390 | N/A | 1 MB |
IBM S/390 Based Linux | s390 | N/A | N/A |
* Some older packaging for the 2.6.5 kernel on SLES8 (like 2.6.5-7.97) can have 2 MB Hugepagesize.
** Oracle RDBMS is also not certified in this configuration. See Document 341507.1
HugePages Reservation
The HugePages reservation feature is fully implemented in 2.6.17 kernel, and thus EL5 (based on 2.6.18) has this feature. The alloc_huge_page() is improved for this. (See kernel source mm/hugetlb.c)
From /usr/share/doc/kernel-doc-2.6.18/Documentation/vm/hugetlbpage.txt:
HugePages_Rsvd is short for "reserved," and is the number of hugepages for which a commitment to allocate from the pool has been made, but no allocation has yet been made. It's vaguely analogous to overcommit.
This feature in the Linux kernel enables the Oracle Database to be able to allocate hugepages for the sublevels of the SGA on-demand. The same behaviour is expected for various Oracle Database versions that are certified on EL5.
HugePages and Oracle 11g Automatic Memory Management (AMM)
The AMM and HugePages are not compatible. One needs to disable AMM on 11g to be able to use HugePages. See Document 749851.1 for further information.
What if Not Enough HugePages Configured?
Configuring your Linux OS for HugePages is a delicate process where if you do not configure properly, the system may experience serious problems. If you do not have enough HugePages configured you may encounter:
- HugePages not used (HugePages_Total = HugePages_Free) at all wasting the amount configured for
- Poor database performance
- System running out of memory or excessive swapping
- Some or any database instance cannot be started
- Crucial system services failing (e.g.: CRS)
To avoid / help with such situations Bug 10153816 was filed to introduce a database initialization parameter in 11.2.0.2 (use_large_pages) to help manage which SGAs will use huge pages and potentially give warnings or not start up at all if they cannot get those pages.
What if Too Much HugePages Configured?
It is of course technically possible to configure more than needed. When that is done, the unused part of HugePages allocation will not be available for other purposes on the system and memory shortage can be encountered. Please make sure to configure only for needed amount of hugepages.
Parameters/Setup
The following configurations are a minimal list of documents providing general guidelines to configure HugePages for more than one Oracle RDBMS instance:
- Document 317055.1 How to Configure RHEL 3.0 32-bit for Very Large Memory with ramfs and hugepages
- Document 317067.1 How to Configure Asianux 1.0 32-bit for Very Large Memory with ramfs and hugepages
- Document 317141.1 How to Configure RHEL 4 32-bit for Very Large Memory with ramfs and hugepages
- Document 317139.1 How to Configure SuSE SLES 9 32-bit for Very Large Memory with ramfs and hugepages
- Document 361468.1 HugePages on 64-bit Linux
For all configurations be sure to have environment variable DISABLE_HUGETLBFS is unset. If it is set (specifically to 1) it will disable the use of HugePages by Oracle database.
The performed configuration is basically based on the RAM installed and combined size of SGA of database instances you are running. Based on that when:
- Amount of RAM installed for the Linux OS changed
- New database instance(s) introduced
- SGA size / configuration changed for one or more database instances
you should revise your HugePages configuration to make it suitable to the new memory framework. If not you may experience one or more problems below on the system:
- Poor database performance
- System running out of memory or excessive swapping
- Database instances cannot be started
- Crucial system services failing
Kernel Version 2.4
The kernel parameter used for HugePages is vm.hugetlb_pool which is based on MB of memory. RHEL3, Asianux 1.0, SLES8 (Service Pack 3 and over) are examples of distributions with the 2.4 kernels with HugePages support. For the configuration, follow steps below:
1. Start database instance(s)
2. Calculate hugetlb_pool using script from Note 401749.1
3. Shutdown database instances
4. Set kernel parameter:
# sysctl -w vm.hugetlb_pool=
and make sure that the parameter is persistent to reboots. e.g. On Asianux 1.0 by editing /etc/sysctl.conf adding/modifying as below:
vm.hugetlb_pool=
5. Check available hugepages:
$ grep Huge /proc/meminfo
6. Restart database instances
7. Check available hugepages:
$ grep Huge /proc/meminfo
Notes:
- If the setting of hugetlb_pool is not effective, you will need to reboot the server to make HugePages allocation during system startup.
- The HugePages are allocated in a lazy fashion, so the "Hugepages_Free" count drops as the pages get touched and are backed by physical memory. The idea is that it's more efficient in the sense that you don't use memory you don't touch.
- If you had set the instance initialization parameter PRE_PAGE_SGA=TRUE (for suitable settings see Document 30793.1), all of the pages would be allocated from HugePages up front.
Kernel Version 2.6
The kernel parameter used for HugePages is vm.nr_hugepages which is based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are examples of distributions with the 2.6 kernel. For the configuration, follow steps below:
1. Start database instance(s)
2. Calculate nr_hugepages using script from Document 401749.1
3. Shutdown database instances
4. Set kernel parameter:
# sysctl -w vm.nr_hugepages=
and make sure that the parameter is persistent to reboots. e.g. On SLES9:
# chkconfig boot.sysctl on
5. Check available hugepages:
$ grep Huge /proc/meminfo
6. Restart database instances
7. Check available hugepages:
$ grep Huge /proc/meminfo
Notes:
- If the setting of nr_hugepages is not effective, you will need to reboot the server to make HugePages allocation during system startup.
- The HugePages are allocated in a lazy fashion, so the "Hugepages_Free" count drops as the pages get touched and are backed by physical memory. The idea is that it's more efficient in the sense that you don't use memory you don't touch.
- If you had set the instance initialization parameter PRE_PAGE_SGA=TRUE (for suitable settings see Document 30793.1), all of the pages would be allocated from HugePages up front.
Notes on HugePages in General
- The userspace application that employs HugePages should be aware of permission implications. Permissions HugePages segments in memory can strictly impose certain requirements. e.g. Per Bug 6620371 on Linux x86-64 port of Oracle RDBMS until 11g was setting the shared memory flags to hugetlb, read and write by default. But that shall depend on the configuration environment and with Patch 6620371 on 10.2 and with 11g, the read and write permissions are set based on the internal context.
For RedHat 6, Oracle Linux 6, SLES 11 and UEK2 kernels please have at "ALERT: Disable Transparent HugePages on SLES11, RHEL6, Oracle Linux 6 and UEK2 Kernels (Doc ID 1557478.1)"
REFERENCES
NOTE:261889.1
- Bigpages vs. Hugetlb on RedHat Linux
NOTE:317141.1
- How to Configure RHEL/OL 4 32-bit for Very Large Memory with ramfs and HugePages
BUG:10153816
- WHEN USE_LARGE_PAGES=ONLY AND NO HUGEPAGES EXIST STARTUP FAILS NO DIAGNOSTIC
NOTE:1392497.1
- USE_LARGE_PAGES To Enable HugePages
NOTE:311504.1
- QREF: Linux Kernel Version Nomenclature
NOTE:317055.1
- How to Configure RHEL 3.0 32-bit for Very Large Memory and HugePages
NOTE:317067.1
- How to Configure Asianux 1.0 32-bit for Very Large Memory with ramfs and hugepages
NOTE:452326.1
- Linux Kernel Lowmem Pressure Issues and Related Kernel Structures
NOTE:317139.1
- How to Configure SuSE SLES 9 / 10 32-bit for Very Large Memory with ramfs and HugePages
NOTE:341507.1
- Oracle Database Server on Linux on IBM POWER
NOTE:1557478.1
- ALERT: Disable Transparent HugePages on SLES11, RHEL6, OL6 and UEK2 Kernels
NOTE:401749.1
- Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration
11gR2
G Very Large Memory and HugePages
This chapter guides Linux system administrators to configure very large memory configurations and HugePages on Linux systems.
This chapter contains the following sections:
Very Large Memory on Linux x86
Overview of HugePages
G.1 Very Large Memory on Linux x86
Very Large Memory (VLM) configurations allow a 32-bit Oracle Database to access more than 4GB RAM that is traditionally available to Linux applications. The Oracle VLM option for 32-bit creates a large database buffer cache using an in-memory file system (/dev/shm). Other parts of the SGA are allocated from regular memory. VLM configurations improve database performance by caching more database buffers in memory, which significantly reduces the disk I/O compared to configurations without VLM. This chapter shows how to increase the SGA memory using VLM on a 32-bit computer.
Note:
The contents documented in this section apply only to 32-bit Linux operating system. With a 64-bit architecture, VLM support is available natively. All 64-bit Linux operating systems use the physical memory directly, as the maximum available virtual address space is 16 EB (exabyte = 2^60 bytes.)
This section includes the following topics:
Implementing VLM on 32-bit Linux
Prerequisites for Implementing VLM
Methods To Increase SGA Limits
Configuring Very Large Memory for Oracle Database
Restrictions Involved in Implementing Very Large Memory
G.1.1 Implementing VLM on 32-bit Linux
With 32-bit architectures, VLM is accessed through a VLM window of a specific size. The VLM window is a data structure in the process address space that provides access to the whole virtual address space from a window of a specific size. On 32-bit Linux, you must set the parameter USE_INDIRECT_DATA_BUFFERS=TRUE, and mount a shmfs or tmpfs or ramfs type of in-memory filesystem over /dev/shm to increase the usable address space.
G.1.2 Prerequisites for Implementing VLM
The following are some of the prerequisites for implementing VLM on a 32-bit operating system:
The computer on which Oracle Database is installed must have more than 4GB of memory.
The computer must be configured to use a kernel with PAE support upon startup.
The USE_INDIRECT_DATA_BUFFERS=TRUE must be present in the initialization parameter file for the database instance that uses VLM support.
Initialization parameters DB_BLOCK_BUFFERS and DB_BLOCK_SIZE must be set to values you have chosen for the Oracle Database.
G.1.3 Methods To Increase SGA Limits
In a typical 32-bit Linux kernel, one can create an SGA of up to 2.4GB size. Using a Linux Hugemem kernel enables the creation of an SGA of upto 3.2GB size. To go beyond 3.2GB on a 32-bit kernel, you must use the VLM feature.
The following are the methods to increase SGA limits on a 32-bit computer:
Hugemem Kernel
Hugemem Kernel with Very Large Memory
G.1.3.1 Hugemem Kernel
Red Hat Enterprise Linux 4 and Oracle Linux 4 include a new kernel known as the Hugemem kernel. The Hugemem kernel feature is also called a 4GB-4GB Split Kernel as it supports a 4GB per process user space (versus 3GB for the other kernels), and a 4GB direct kernel space. Using this kernel enables RHEL 4/Oracle Linux 4 to run on systems with up to 64GB of main memory. The Hugemem kernel is required to use all the memory in system configurations containing more than 16GB of memory. The Hugemem kernel can run configurations with less memory.
A classic 32-bit 4GB virtual address space is split 3GB for user processes and 1GB for the kernel. The new scheme (4GB/4GB) permits 4GB of virtual address space for the kernel and almost 4GB for each user process. Due to this scheme with hugemem kernel, 3.2GB of SGA can be created without using the indirect data buffer method.
Note:
Red Hat Enterprise Linux 5/ Oracle Linux 5 and Red Hat Enterprise Linux 6/ Oracle Linux 6 on 32-bit does not have the hugemem kernel. It supports only the 3GB user process/ 1GB kernel split. It has a PAE kernel that supports systems with more than 4GB of RAM and reliably upto 16GB. Since this has a 3GB/1GB kernel split, the system may run out of lowmem if the system's load consumes lots of lowmem. There is no equivalent kernel for hugemem in Enterprise Linux 5 and one is recommended to either use Enterprise Linux 4 with hugemem or go for 64-bit.
The Hugemem kernel on large computers ensures better stability as compared to the performance overhead of address space switching.
Run the following command to determine if you are using the Hugemem kernel:
$ uname -r 2.6.9-5.0.3.ELhugemem
G.1.3.2 Hugemem Kernel with Very Large Memory
If you use only Hugemem kernels on 32-bit systems, then the SGA size can be increased but not significantly. Refer to section "Hugemem Kernel", for more information.
Note:
Red Hat Enterprise Linux 5/ Oracle Linux 5 and Red Hat Enterprise Linux 6/ Oracle Linux 6 does not support the hugemem kernel. It supports a PAE kernel that can be used to implement Very Large Memory (VLM) as long as the physical memory does not exceed 16GB.
This section shows how the SGA can be significantly increased by using Hugemem kernel with VLM on 32-bit systems.
The SGA can be increased to about 62GB (depending on block size) on a 32-bit system with 64GB RAM. A processor feature called Page Address Extension (PAE) permits you to physically address 64GB of RAM. Since PAE does not enable a process or program to either address more than 4GB directly, or have a virtual address space larger than 4GB, a process cannot attach to shared memory directly. To address this issue, a shared memory filesystem (memory-based filesystem) must be created which can be as large as the maximum allowable virtual memory supported by the kernel. With a shared memory filesystem processes can dynamically attach to regions of the filesystem allowing applications like Oracle to have virtually a much larger shared memory on 32-bit operating systems. This is not an issue on 64-bit operating systems.
VLM moves the database buffer cache part of the SGA from the System V shared memory to the shared memory filesystem. It is still considered one large SGA but it consists now of two different operating system shared memory entities. VLM uses 512MB of the non-buffer cache SGA to manage VLM. This memory area is needed for mapping the indirect data buffers (shared memory filesystem buffers) into the process address space since a process cannot attach to more than 4GB directly on a 32-bit system.
Note:
USE_INDIRECT_DATA_BUFFERS = TRUE must be present in the initialization parameter file for the database instance that use Very Large Memory support. If this parameter is not set, then Oracle Database 11 g Release 2 (11.2) or later behaves in the same way as previous releases.
You must also manually set the initialization parameters DB_BLOCK_BUFFERS and SHARED_POOL_SIZE to values you have chosen for an Oracle Database. Automatic Memory Management (AMM) cannot be used. The initialization parameter DB_BLOCK_SIZE sets the block size and in combination with DB_BLOCK_BUFFERS determines the buffer cache size for an instance
For example, if the non-buffer cache SGA is 2.5GB, then you will only have 2GB of non-buffer cache SGA for shared pool, large pool, and redo log buffer since 512MB is used for managing VLM. It is not recommended to use VLM if buffer cache size is less than 512MB.
In RHEL 4/ Oracle Linux 4 there are two different memory file systems that can be used for VLM:
-
tmpfs or shmfs: mount a shmfs with a certain size to /dev/shm, and set the correct permissions. For tmpfs you do not need to specify a size. Tmpfs or shmfs allocated memory is pageable.
For example:
Example Mount shmfs: # mount -t shm shmfs -o size=20g /dev/shm Edit /etc/fstab: shmfs /dev/shm shm size=20g 0 0 OR Example Mount tmpfs: # mount –t tmpfs tmpfs /dev/shm Edit /etc/fstab: none /dev/shm tmpfs defaults 0 0
-
ramfs: ramfs is similar to shmfs, except that pages are not pageable or swappable. This approach provides the commonly desired effect. ramfs is created by:
umount /dev/shm mount -t ramfs ramfs /dev/shm
G.1.4 Configuring Very Large Memory for Oracle Database
Complete the following procedure to configure Very Large Memory on Red Hat Enterprise Linux 4/ Oracle Linux 4 using ramfs:
-
Log in as a root user:
sudo -sh Password:
-
Edit the /etc/rc.local file and add the following entries to it to configure the computer to mount ramfs over the /dev/shm directory, whenever you start the computer:
umount /dev/shm mount -t ramfs ramfs /dev/shm chown oracle:oinstall /dev/shm
In the preceding commands, oracle is the owner of Oracle software files and oinstall is the group for Oracle owner account. If the new configuration disables /etc/rc.local file or you start an instance of Oracle database using a Linux service script present under the /etc/init.d file, then you can add those entries in the service script too.
Note, this configuration will make ramfs ready even before your system autostarts crucial Oracle Database instances. The commands can also be included in your startup scripts. It is important that you test the commands extensively by repeated restart action, after you complete configuring the computer using the following steps:
Restart the server.
Log in as a root user.
-
Run the following command to check if the /dev/shm directory is mounted with the ramfs type:
/dev/shm directory is mounted with the ramfs type: # mount | grep shm ramfs on /dev/shm type ramfs (rw)
-
Run the following command to check the permissions on the /dev/shm directory:
# ls -ld /dev/shm drwxr-xr-x 3 oracle oinstall 0 Jan 13 12:12 /dev/shm
-
Edit the /etc/security/limits.conf file and add the following entries to it to increase the max locked memory limit:
soft memlock 3145728 hard memlock 3145728
-
Switch to the oracle user:
# sudo - oracle Password:
-
Run the following command to check the max locked memory limit:
$ ulimit -l 3145728
-
Complete the following procedure to configure instance parameters for Very Large Memory:
Replace the DB_CACHE_SIZE, DB_xK_CACHE_SIZE, sga_target, and memory_target parameters with DB_BLOCK_BUFFERS parameter.
Add the USE_INDIRECT_DATA_BUFFERS=TRUE parameter.
Configure SGA size according to the SGA requirements.
Remove SGA_TARGET, MEMORY_TARGET, or MEMORY_MAX_TARGET parameters, if set.
Start the database instance.
-
Run the following commands to check the memory allocation:
$ ls -l /dev/shm $ ipcs -m
See Also:
"Configuring HugePages on Linux" section for more information about HugePages.
G.1.5 Restrictions Involved in Implementing Very Large Memory
Following are the limitations of running a computer in the Very Large Memory mode:
You cannot use Automatic Memory Management (AMM) while implementing VLM using ramfs, because AMM works on dynamic SGA tuning. With AMM swapping is possible. For example, you can unmap the unused SGA space and map it to PGA. Dynamic SGA and multiple block size are not supported with Very Large Memory because ramfs is not swappable. To enable Very Large Memory, you must ensure that you set the value of MEMORY_TARGET to zero.
VLM can be implemented only if Database Buffer Cache size is greater than 512MB.
G.2 Overview of HugePages
HugePages is a feature integrated into the Linux kernel 2.6. Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries. HugePages is useful for both 32-bit and 64-bit configurations. HugePage sizes vary from 2MB to 256MB, depending on the kernel version and the hardware architecture. For Oracle Databases, using HugePages reduces the operating system maintenance of page states, and increases Translation Lookaside Buffer (TLB) hit ratio.
This section includes the following topics:
Tuning SGA With HugePages
Configuring HugePages on Linux
Restrictions for HugePages Configurations
G.2.1 Tuning SGA With HugePages
Without HugePages, the operating system keeps each 4KB of memory as a page, and when it is allocated to the SGA, then the lifecycle of that page (dirty, free, mapped to a process, and so on) is kept up to date by the operating system kernel.
With HugePages, the operating system page table (virtual memory to physical memory mapping) is smaller, since each page table entry is pointing to pages from 2MB to 256MB. Also, the kernel has fewer pages whose lifecyle must be monitored.
Note:
2MB size of HugePages is available with Linux x86-64, Linux x86, and IBM: Linux on System z.
The following are the advantages of using HugePages:
Increased performance through increased TLB hits.
Pages are locked in memory and are never swapped out which guarantees that shared memory like SGA remains in RAM.
Contiguous pages are preallocated and cannot be used for anything else but for System V shared memory (for example, SGA)
Less bookkeeping work for the kernel for that part of virtual memory due to larger page sizes
G.2.2 Configuring HugePages on Linux
Complete the following steps to configure HugePages on the computer:
-
Edit the memlock setting in the /etc/security/limits.conf file. The memlock setting is specified in KB and set slightly lesser than the installed RAM. For example, if you have 64GB RAM installed, add the following entries to increase the max locked memory limit:
* soft memlock 60397977 * hard memlock 60397977
You can also set the memlock value higher than your SGA requirements.
-
Login as the oracle user again and run the ulimit -l command to verify the new memlock setting:
$ ulimit -l 60397977
-
Run the following command to display the value of Hugepagesize variable:
$ grep Hugepagesize /proc/meminfo
-
Complete the following procedure to create a script that computes recommended values for hugepages configuration for the current shared memory segments:
Note:
Following is an example that may require modifications.
Create a text file named hugepages_settings.sh.
-
Add the following content in the file:
#!/bin/bash # # hugepages_settings.sh # # Linux bash script to compute values for the # recommended HugePages/HugeTLB configuration # # Note: This script does calculation for all shared memory # segments available when the script is run, no matter it # is an Oracle RDBMS shared memory segment or not. # Check for the kernel version KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` # Find out the HugePage size HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}` # Start from 1 pages to be on the safe side and guarantee 1 free HugePage NUM_PG=1 # Cumulative number of pages required to handle the running shared memory segments for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"` do MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q` if [ $MIN_PG -gt 0 ]; then NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q` fi done # Finish with results case $KERN in '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`; echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;; '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; *) echo "Unrecognized kernel version $KERN. Exiting." ;; esac # End
-
Run the following command to change the permission of the file:
$ chmod +x hugepages_settings.sh
-
Run the hugepages_settings.sh script to compute the values for hugepages configuration:
$ ./hugepages_settings.sh
-
Set the following kernel parameter:
# sysctl -w vm.nr_hugepages=value_displayed_in_step_5
-
To make the value of the parameter available for every time you restart the computer, edit the /etc/sysctl.conf file and add the following entry:
vm.nr_hugepages=value_displayed_in_step_5
-
Restart the server.
Note:
To check the available hugepages, run the following command:
$ grep Huge /proc/meminfo
G.2.3 Restrictions for HugePages Configurations
Following are the limitations of using HugePages:
Automatic Memory Management (AMM) and HugePages are not compatible. When you use AMM, the entire SGA memory is allocated by creating files under /dev/shm. When Oracle Database allocates SGA with AMM, HugePages are not reserved. To use HugePages on Oracle Database 12c, You must disable AMM.
If you are using VLM in a 32-bit environment, then you cannot use HugePages for the Database Buffer cache. You can use HugePages for other parts of the SGA, such as shared_pool, large_pool, and so on. Memory allocation for VLM (buffer cache) is done using shared memory file systems (ramfs/tmpfs/shmfs). Memory file systems do not reserve or use HugePages.
HugePages are not subject to allocation or release after system startup, unless a system administrator changes the HugePages configuration, either by modifying the number of pages available, or by modifying the pool size. If the space required is not reserved in memory during system startup, then HugePages allocation fails.
12cR1
G HugePages
This chapter provides an overview of Hugepages and guides Linux system administrators to configure HugePages on Linux.
G.1 Overview of HugePages
HugePages is a feature integrated into the Linux kernel 2.6. Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4 KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries. HugePages is useful for both 32-bit and 64-bit configurations. HugePage sizes vary from 2 MB to 256 MB, depending on the kernel version and the hardware architecture. For Oracle Databases, using HugePages reduces the operating system maintenance of page states, and increases Translation Lookaside Buffer (TLB) hit ratio.
Note:
Transparent Hugepages is currently not an alternative to manually configure HugePages.
This section includes the following topics:
Tuning SGA With HugePages
Configuring HugePages on Linux
Restrictions for HugePages Configurations
Disabling Transparent HugePages
G.1.1 Tuning SGA With HugePages
Without HugePages, the operating system keeps each 4 KB of memory as a page. When it allocates pages to the database System Global Area (SGA), the operating system kernel must continually update its page table with the page lifecycle (dirty, free, mapped to a process, and so on) for each 4 KB page allocated to the SGA.
With HugePages, the operating system page table (virtual memory to physical memory mapping) is smaller, because each page table entry is pointing to pages from 2 MB to 256 MB.
Also, the kernel has fewer pages whose lifecycle must be monitored. For example, if you use HugePages with 64-bit hardware, and you want to map 256 MB of memory, you may need one page table entry (PTE). If you do not use HugePages, and you want to map 256 MB of memory, then you must have 256 MB * 1024 KB/4 KB = 65536 PTEs.
HugePages provides the following advantages:
Increased performance through increased TLB hits
Pages are locked in memory and never swapped out, which provides RAM for shared memory structures such as SGA
Contiguous pages are preallocated and cannot be used for anything else but for System V shared memory (for example, SGA)
Less bookkeeping work for the kernel for that part of virtual memory because of larger page sizes
G.1.2 Configuring HugePages on Linux
Complete the following steps to configure HugePages on the computer:
-
Run the following command to determine if the kernel supports HugePages:
$ grep Huge /proc/meminfo
Some Linux systems do not support HugePages by default. For such systems, build the Linux kernel using the CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE configuration options.CONFIG_HUGETLBFS is located under File Systems and CONFIG_HUGETLB_PAGE is selected when you select CONFIG_HUGETLBFS.
-
Edit the memlock setting in the /etc/security/limits.conf file. The memlock setting is specified in KB, and the maximum locked memory limit should be set to at least 90 percent of the current RAM when HugePages memory is enabled and at least 3145728 KB (3 GB) when HugePages memory is disabled. For example, if you have 64 GB RAM installed, then add the following entries to increase the maximum locked-in-memory address space:
* soft memlock 60397977 * hard memlock 60397977
You can also set the memlock value higher than your SGA requirements.
-
Log in as oracle user again and run the ulimit -l command to verify the new memlock setting:
$ ulimit -l 60397977
-
Run the following command to display the value of Hugepagesize variable:
$ grep Hugepagesize /proc/meminfo
-
Complete the following procedure to create a script that computes recommended values for hugepages configuration for the current shared memory segments:
Create a text file named hugepages_settings.sh.
-
Add the following content in the file:
#!/bin/bash # # hugepages_settings.sh # # Linux bash script to compute values for the # recommended HugePages/HugeTLB configuration # # Note: This script does calculation for all shared memory # segments available when the script is run, no matter it # is an Oracle RDBMS shared memory segment or not. # Check for the kernel version KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` # Find out the HugePage size HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}` # Start from 1 pages to be on the safe side and guarantee 1 free HugePage NUM_PG=1 # Cumulative number of pages required to handle the running shared memory segments for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"` do MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q` if [ $MIN_PG -gt 0 ]; then NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q` fi done # Finish with results case $KERN in '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`; echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;; '2.6'|'3.8') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; *) echo "Unrecognized kernel version $KERN. Exiting." ;; esac # End
-
Run the following command to change the permission of the file:
$ chmod +x hugepages_settings.sh
-
Run the hugepages_settings.sh script to compute the values for hugepages configuration:
$ ./hugepages_settings.sh
Note:
Before running this script, ensure that all the applications that use hugepages run.
-
Set the following kernel parameter, where value is the HugePages value that you determined in step 7:
# sysctl -w vm.nr_hugepages=value
-
To ensure that HugePages is allocated after system restarts, add the following entry to the /etc/sysctl.conf file, where value is the HugePages value that you determined in step 7:
vm.nr_hugepages=value
-
Run the following command to check the available hugepages:
$ grep Huge /proc/meminfo
Restart the instance.
-
Run the following command to check the available hugepages (1 or 2 pages free):
$ grep Huge /proc/meminfo
Note:
If you cannot set your HugePages allocation using nr_hugepages, then your available memory may be fragmented. Restart your server for the Hugepages allocation to take effect.
G.1.3 Restrictions for HugePages Configurations
HugePages has the following limitations:
You must unset both the MEMORY_TARGET and MEMORY_MAX_TARGET initialization parameters. For example, to unset the parameters for the database instance, use the command ALTER SYSTEM RESET.
Automatic Memory Management (AMM) and HugePages are not compatible. When you use AMM, the entire SGA memory is allocated by creating files under /dev/shm. When Oracle Database allocates SGA with AMM, HugePages are not reserved. To use HugePages on Oracle Database 12c, You must disable AMM.
If you are using VLM in a 32-bit environment, then you cannot use HugePages for the Database Buffer cache. You can use HugePages for other parts of the SGA, such as shared_pool, large_pool, and so on. Memory allocation for VLM (buffer cache) is done using shared memory file systems (ramfs/tmpfs/shmfs). Memory file systems do not reserve or use HugePages.
HugePages are not subject to allocation or release after system startup, unless a system administrator changes the HugePages configuration, either by modifying the number of pages available, or by modifying the pool size. If the space required is not reserved in memory during system startup, then HugePages allocation fails.
Ensure that HugePages is configured properly as the system may run out of memory if excess HugePages is not used by the application.
If there is insufficient HugePages when an instance starts and the initialization parameter use_large_pages is set to only, then the database fails to start and an alert log message provides the necessary information on Hugepages.
G.1.4 Disabling Transparent HugePages
Transparent HugePages memory is enabled by default with Red Hat Enterprise Linux 6, SUSE 11, and Oracle Linux 6 with earlier releases of Oracle Linux Unbreakable Enterprise Kernel 2 (UEK2) kernels. Transparent HugePages memory is disabled by default in later releases of UEK2 kernels.
Transparent HugePages can cause memory allocation delays at runtime. To avoid performance issues, Oracle recommends that you disable Transparent HugePages on all Oracle Database servers. Oracle recommends that you instead use standard HugePages for enhanced performance.
Transparent HugePages memory differs from standard HugePages memory because the kernel khugepaged thread allocates memory dynamically during runtime. Standard HugePages memory is pre-allocated at startup, and does not change during runtime.
To check if Transparent HugePages is enabled run one of the following commands as the root user:
Red Hat Enterprise Linux kernels:
# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
Other kernels:
# cat /sys/kernel/mm/transparent_hugepage/enabled
The following is a sample output that shows Transparent HugePages is being used as the [always] flag is enabled.
[always] never
Note:
If Transparent HugePages is removed from the kernel then the /sys/kernel/mm/transparent_hugepage or /sys/kernel/mm/redhat_transparent_hugepage files do not exist.
To disable Transparent HugePages perform the following steps:
-
Add the following entry to the kernel boot line in the /etc/grub.conf file:
transparent_hugepage=never
For example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img