why we need partitions
The first and most demanding reason to use partitions in a database is to increase the performance of the database. This is achieved by partition-wise joins; if a user’s queries perform a lot of full-table scans, partitioning will help vastly, because partitions will limit the scope of this search. The second important reason to partition is ease of managing large tables.
Partitioning always helps manage large objects. Although it is applicable to objects of any size, the advantages are more apparent in large tables. When a user recreates an index on a nonpartitioned table, the only option is to build the entire index in one statement. On the other hand, if the table is partitioned, the user can rebuild partitions of the local indexes one at a time.
PostgreSQL supports table partitioning through table inheritance, which means every
partition will be created as a child table of a single parent table. Partitioning is performed in such a way that every child table inherits a single parent table. The parent table will be empty; it exists just to describe the whole dataset. Currently in PostgreSQL, partitioning can be implemented in range partitioning or list partitioning.
Range partitioning can be done, for example, by record number ranges (such as record 0 to 100 or 100 to 200) or even using date ranges (such as from 2014-11-01 to 2014-11-30).
List partitioning can be done, for example, by a list of cities (such as New York, Boston,
Chicago, and Houston) or list of departments (such as HR, finance, administration, and so on).
There are five simple steps used to create a partition in PostgreSQL, which are as follows:
1. Create the master table.
2. Create multiple child tables without having an overlapped table constraint.
3. Create indexes.
4. Create a trigger function to insert data into child tables.
5. Enable the constraint exclusion.
Range partition
The range partition is the partition in which we partition a table into ranges defined by a single column or multiple columns. When defining the ranges, the user will have to take care that ranges should be connected and not overlapping with each other; moreover, ranges must be defined using the < value operator. For instance, one can partition by date ranges or ranges of identifiers for specific business objects.
Creating the master table
[postgres@minion1 bin]$ ./psql
psql (9.4.5)
Type "help" for help.
postgres=# create database partition_db;
CREATE DATABASE
postgres=# \c partition_db
You are now connected to database "partition_db" as user "postgres".
partition_db=# create database warehouse_db;
CREATE DATABASE
partition_db=# \c warehouse_db
You are now connected to database "warehouse_db" as user "postgres".
warehouse_db=# CREATE TABLE sales_record
warehouse_db-# (
warehouse_db(# id NUMERIC PRIMARY KEY,
warehouse_db(# sales_amount NUMERIC,
warehouse_db(# sales_date DATE NOT NULL DEFAULT CURRENT_DATE
warehouse_db(# );
CREATE TABLE
Creating a range partition table
Each partition will have two months’ data.
Create the sales_record_m1_to_m2 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m1_to_m2
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-01-01'
warehouse_db(# AND sales_date < DATE '2015-03-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of January and February.
Now, create the sales_record_m3_to_m4 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m3_to_m4
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-03-01'
warehouse_db(# AND sales_date < DATE '2015-05-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of March and April.
Create the sales_record_m5_to_m6 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m5_to_m6
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-05-01'
warehouse_db(# AND sales_date < DATE '2015-07-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of May and June.
Create the sales_record_m7_to_m8 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m7_to_m8
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-07-01'
warehouse_db(# AND sales_date < DATE '2015-09-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of July and August.
Create the sales_record_m9_to_m10 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m9_to_m10
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-09-01'
warehouse_db(# AND sales_date < DATE '2015-11-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of September and October.
Now, create the sales_record_m11_to_m12 child table as follows:
warehouse_db=# CREATE TABLE sales_record_m11_to_m12
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, sales_date),
warehouse_db(# CHECK (sales_date >= DATE '2015-11-01'
warehouse_db(# AND sales_date < DATE '2016-01-01')
warehouse_db(# )
warehouse_db-# INHERITS (sales_record);
CREATE TABLE
This child table will contain data of November and December.
Verifing that the tables are linked and the partition is successfully created using the following query:
warehouse_db=# \d+ sales_record
Table "public.sales_record"
Column | Type | Modifiers | Storage | Stats target | Description
--------------+---------+--------------------------------------+---------+--------------+-------------
id | numeric | not null | main | |
sales_amount | numeric | | main | |
sales_date | date | not null default ('now'::text)::date | plain | |
Indexes:
"sales_record_pkey" PRIMARY KEY, btree (id)
Child tables: sales_record_m11_to_m12,
sales_record_m1_to_m2,
sales_record_m3_to_m4,
sales_record_m5_to_m6,
sales_record_m7_to_m8,
sales_record_m9_to_m10
Creating an index on child tables
Now, we are going to create indexes on child tables to speed up the sales_day field usage, using almost all queries (INSERT, SELECT, and UPDATE) on the date field.
Create the m1_to_m2_sales_date index on the sales_record_m1_to_m2 child table as follows:
warehouse_db=# CREATE INDEX m1_to_m2_sales_date ON sales_record_m1_to_m2
warehouse_db-# (sales_date);
CREATE INDEX
Create the m3_to_m4_sales_date index on the sales_record_m3_to_m4 child table as
follows:
warehouse_db=# CREATE INDEX m3_to_m4_sales_date ON sales_record_m3_to_m4
warehouse_db-# (sales_date);
CREATE INDEX
Now, let’s create the m5_to_m6_sales_date index on the sales_record_m5_to_m6 child table:
warehouse_db=# CREATE INDEX m5_to_m6_sales_date ON sales_record_m5_to_m6
warehouse_db-# (sales_date);
CREATE INDEX
Create the m7_to_m8_sales_date index on the sales_record_m7_to_m8 child table as
follows:
warehouse_db=# CREATE INDEX m7_to_m8_sales_date ON sales_record_m7_to_m8
warehouse_db-# (sales_date);
CREATE INDEX
Then, create the m9_to_m10_sales_date index on the sales_record_m9_to_m10 child table as follows:
warehouse_db=# CREATE INDEX m9_to_m10_sales_date ON sales_record_m9_to_m10
warehouse_db-# (sales_date);
CREATE INDEX
Create the m11_to_m12_sales_date index on the sales_record_m11_to_m12 child table as follows:
warehouse_db=# CREATE INDEX m11_to_m12_sales_date ON
warehouse_db-# sales_record_m11_to_m12 (sales_date);
CREATE INDEX
Creating a trigger on the master table
Conditions must be exactly the same as what the child tables check.
creating a trigger function using the following syntax:
warehouse_db=# CREATE OR REPLACE FUNCTION sales_record_insert()
warehouse_db-# RETURNS TRIGGER AS $$
warehouse_db$# BEGIN
warehouse_db$# IF (NEW.sales_date >= DATE '2015-01-01' AND NEW.sales_date < DATE '2015-03-01') THEN
warehouse_db$# INSERT INTO sales_record_m1_to_m2 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-03-01' AND NEW.sales_date < DATE '2015-05-01') THEN
warehouse_db$# INSERT INTO sales_record_m3_to_m4 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-05-01' AND NEW.sales_date < DATE '2015-07-01') THEN
warehouse_db$# INSERT INTO sales_record_m5_to_m6 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-07-01' AND NEW.sales_date < DATE '2015-09-01') THEN
warehouse_db$# INSERT INTO sales_record_m7_to_m8 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-09-01' AND NEW.sales_date < DATE '2015-11-01') THEN
warehouse_db$# INSERT INTO sales_record_m9_to_m10 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-11-01' AND NEW.sales_date < DATE '2016-01-01') THEN
warehouse_db$# INSERT INTO sales_record_m11_to_m12 VALUES (NEW.*);
warehouse_db$# ELSE
warehouse_db$# RAISE EXCEPTION 'Date is out of range. Something is wrong with sales_record_insert_trigger() function';
warehouse_db$# END IF;
warehouse_db$# RETURN NULL;
warehouse_db$# END;
warehouse_db$# $$LANGUAGE plpgsql;
CREATE FUNCTION
This function will simply populate the data in the respective child table on the basis of the sales_date condition.
Now, the supportive trigger will call the preceding trigger function automatically
whenever a user uses the INSERT, UPDATE, or DELETE operations on data in the master table. Creating the supportive trigger in the following manner:
warehouse_db=# CREATE TRIGGER sales_day_trigger
warehouse_db-# BEFORE INSERT ON sales_record
warehouse_db-# FOR EACH ROW
warehouse_db-# EXECUTE PROCEDURE sales_record_insert();
CREATE TRIGGER
Enabling the constraint exclusion
The final step in implementation of partitioning is enabling the constraint exclusion.Constraint exclusion is backed by the CHECK constraints that we have used in our child table’s CREATE syntax. Do remember that if the constraint exclusion is disabled, then our
queries will not use CHECK constraints, and as a result, every query scan will be done on all the child tables and thus will reduce performance; this is why constraint exclusion to is very important when using partitioned tables.
The steps to enable constraint exclusion are as follows:
1. Open the postgresql.conf file that is present in the data directory on your default installation path. In most cases, it is /usr/local/pgsql/data.
2. Set constraint exclusion on with the following row in postgresql.conf:
constraint_exclusion = on
Alternatively, you can set constraint exclusion to on using the following command on
psql:
warehouse_db=# SET constraint_exclusion = on;
Congratulations! Finally, the master table is available for the DML and DDL operations,and all the INSERT, SELECT and DELETE operations go to the child tables by date.
Performing DML operations on a partition table
When a user inserts a row in the master table, our trigger sales_day_trigger will be triggered, and it will call our sales_record_insert() trigger function, and based on sales_date, the insertion will be made to a specific child table. Now, let’s insert a few records into the sales_record table.We will then perform the SELECT statements to verify that records are properly populated in child tables. This can be done in the following manner:
warehouse_db=# INSERT INTO sales_record
(id, sales_amount, sales_date)
VALUES
(1, 500, TO_DATE('02/12/2015','MM/DD/YYYY'));
INSERT 0 0
warehouse_db=# INSERT INTO sales_record
warehouse_db-# (id, sales_amount, sales_date)
warehouse_db-# VALUES
warehouse_db-# (2, 1500, TO_DATE('03/10/2015','MM/DD/YYYY'));
INSERT 0 0
warehouse_db=# INSERT INTO sales_record
warehouse_db-# (id, sales_amount, sales_date)
warehouse_db-# VALUES
warehouse_db-# (3, 2500, TO_DATE('05/15/2015','MM/DD/YYYY'));
INSERT 0 0
warehouse_db=# INSERT INTO sales_record
warehouse_db-# (id, sales_amount, sales_date)
warehouse_db-# VALUES
warehouse_db-# (4, 2000, TO_DATE('07/25/2015','MM/DD/YYYY'));
INSERT 0 0
warehouse_db=# INSERT INTO sales_record
warehouse_db-# (id, sales_amount, sales_date)
warehouse_db-# VALUES
warehouse_db-# (5, 2200, TO_DATE('09/15/2015','MM/DD/YYYY'));
INSERT 0 0
warehouse_db=# INSERT INTO sales_record
warehouse_db-# (id, sales_amount, sales_date)
warehouse_db-# VALUES
warehouse_db-# (6, 1200, TO_DATE('11/15/2015','MM/DD/YYYY'));
INSERT 0 0
We have inserted six records, and we are now going to perform the SELECT queries on our child tables to verify that our child tables get the right data. Let’s do a SELECT operation on the sales_record_m3_to_m4 table in the following manner:
warehouse_db=# SELECT * FROM sales_record_m3_to_m4;
id | sales_amount | sales_date
----+--------------+------------
2 | 1500 | 2015-03-10
(1 row)
Let’s do a select operation on the sales_record_m9_to_m10 table in the following
manner:
warehouse_db=# SELECT * FROM sales_record_m9_to_m10;
id | sales_amount | sales_date
----+--------------+------------
5 | 2200 | 2015-09-15
(1 row)
As you can see in the preceding result, the sales_record_m9_to_m10 child table contains the record that has sales_date for September and October.
A SELECT operation on the parent/master table will grab data from all child tables. This can be seen using the following statement:
warehouse_db=# SELECT * FROM sales_record;
id | sales_amount | sales_date
----+--------------+------------
1 | 500 | 2015-02-12
2 | 1500 | 2015-03-10
3 | 2500 | 2015-05-15
4 | 2000 | 2015-07-25
5 | 2200 | 2015-09-15
6 | 1200 | 2015-11-15
(6 rows)
Handling the UPDATE and DELETE statements on a partition table
We don’t need any UPDATE or DELETE triggers, the INSERT trigger is self-sufficient to handle the UPDATE or DELETE statement as well. First, let’s perform an UPDATE action and verify the impact on the child table in the following manner:
warehouse_db=# UPDATE sales_record SET sales_date='2015-9-13' WHERE id = 5;
UPDATE 1
The preceding query will update the record of the date 2015-9-15 with id = 5.
Let’s perform a SELECT operation to verify the update of the record in the following
manner:
warehouse_db=# SELECT * FROM sales_record_m9_to_m10;
id | sales_amount | sales_date
----+--------------+------------
5 | 2200 | 2015-09-13
(1 row)
We can now verify that the record is updated and sales_date is changed to a new date,that is, 2015-09-13.
An update on a table that can cause row movement between children requires an UPDATE trigger.
Now, let’s perform a simple DELETE operation and verify the impact on partitioned tables:
warehouse_db=# DELETE FROM sales_record where sales_date = '2015-9-13';
DELETE 1
warehouse_db=# SELECT * FROM sales_record_m9_to_m10;
id | sales_amount | sales_date
----+--------------+------------
(0 rows)
Since the SELECT statement did not return any record, this confirms that we have successfully deleted the record with sales_date = '2015-9-13'.
List partition
List partition is very much similar to range partition. The table is partitioned by explicitly listing which key values appear in each partition. In list partition, each partition is defined and designated based on a column value in one set of value lists, instead of one set of adjoining ranges of values. This will be done by defining each partition by means of the values IN (value_list) syntax, where value_list is a comma-separated list of values.
We will create a master table that will have a sales record along with the city information. The list partition will use the city column as a base to create the child partitions. Let’s create a master table first in the following manner:
warehouse_db=# CREATE TABLE sales_record_listpart
warehouse_db-# (
warehouse_db(# id NUMERIC primary key,
warehouse_db(# sales_date date,
warehouse_db(# sales_amount NUMERIC,
warehouse_db(# city text
warehouse_db(# );
CREATE TABLE
Now, let’s create the child tables on the basis of the city list.
Create the sales_record_list1 table in the following manner:
warehouse_db=# CREATE TABLE sales_record_list1
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, city),
warehouse_db(# CHECK (city IN ('new york', 'sydney'))
warehouse_db(# )
warehouse_db-# INHERITS (sales_record_listpart);
CREATE TABLE
Now, create the sales_record_list2 table in the following manner:
warehouse_db=# CREATE TABLE sales_record_list2
warehouse_db-# (
warehouse_db(# PRIMARY KEY (id, city),
warehouse_db(# CHECK (city IN ('Islamabad', 'Boston', 'London'))
warehouse_db(# )
warehouse_db-# INHERITS (sales_record_listpart);
CREATE TABLE
Let’s create the index for the sales_record_list1 table:
warehouse_db=# CREATE INDEX list1_index ON sales_record_list1(city);
CREATE INDEX
Let’s create the index for the sales_record_list2 table:
warehouse_db=# CREATE INDEX list2_index ON sales_record_list2 (city);
CREATE INDEX
Now, create the trigger function in the following manner:
warehouse_db=# CREATE OR REPLACE FUNCTION sales_record_list_insert()
warehouse_db-# RETURNS TRIGGER AS $$
warehouse_db$# BEGIN
warehouse_db$# IF (NEW.city IN ('new york', 'sydney')) THEN
warehouse_db$# INSERT INTO sales_record_list1 VALUES (NEW.*);
warehouse_db$# ELSEIF (NEW.city IN ('Islamabad', 'Boston', 'London')) THEN
warehouse_db$# INSERT INTO sales_record_list2 VALUES (NEW.*);
warehouse_db$# ELSE
warehouse_db$# RAISE EXCEPTION 'CITY not present in this lists';
warehouse_db$# END IF;
warehouse_db$# RETURN NULL;
warehouse_db$# END;
warehouse_db$# $$LANGUAGE plpgsql;
CREATE FUNCTION
In the end, we need to create the supporting trigger in the following manner:
warehouse_db=# CREATE TRIGGER sales_day_trigger
warehouse_db-# BEFORE INSERT ON sales_record_listpart
warehouse_db-# FOR EACH ROW
warehouse_db-# EXECUTE PROCEDURE sales_record_list_insert();
CREATE TRIGGER
Verifing that the partition is linked with the master table using the following
command:
warehouse_db=# \d+ sales_record_listpart
Table "public.sales_record_listpart"
Column | Type | Modifiers | Storage | Stats target | Description
--------------+---------+-----------+----------+--------------+-------------
id | numeric | not null | main | |
sales_date | date | | plain | |
sales_amount | numeric | | main | |
city | text | | extended | |
Indexes:
"sales_record_listpart_pkey" PRIMARY KEY, btree (id)
Triggers:
sales_day_trigger BEFORE INSERT ON sales_record_listpart FOR EACH ROW EXECUTE PROCEDURE sales_record_list_insert()
Child tables: sales_record_list1,
sales_record_list2
Now, let’s do some quick inserts and verify that our list partition is also working how we expect it do so:
warehouse_db=# INSERT INTO sales_record_listpart
warehouse_db-# (id, sales_date, sales_amount, city)
warehouse_db-# VALUES
warehouse_db-# (1,'15-APR-2008',1200,'sydney');
INSERT 0 0
warehouse_db=# INSERT INTO sales_record_listpart
warehouse_db-# (id, sales_date, sales_amount, city)
warehouse_db-# VALUES
warehouse_db-# (2,'15-APR-2008',1500,'Boston');
INSERT 0 0
warehouse_db=# INSERT INTO sales_record_listpart
warehouse_db-# (id, sales_date, sales_amount, city)
warehouse_db-# VALUES
warehouse_db-# (3,'16-APR-2008',1800,'Islamabad');
INSERT 0 0
warehouse_db=# INSERT INTO sales_record_listpart
warehouse_db-# (id, sales_date, sales_amount, city)
warehouse_db-# VALUES
warehouse_db-# (4,'20-APR-2008',1300,'new york');
INSERT 0 0
When you perform the preceding INSERT statements, you will observe that the INSERT query returns the INSERT 0 0 message; this is because the record is inserted in the child tables instead of the master tables.
Perform SELECT on salse_record_list1 to verify that the record is inserted as expected in the following manner:
warehouse_db=# SELECT * FROM sales_record_list1;
id | sales_date | sales_amount | city
----+------------+--------------+----------
1 | 2008-04-15 | 1200 | sydney
4 | 2008-04-20 | 1300 | new york
(2 rows)
Perform SELECT on sales_record_list2 to verify that the record is inserted as expected in the following manner:
warehouse_db=# SELECT * FROM sales_record_list2;
id | sales_date | sales_amount | city
----+------------+--------------+-----------
2 | 2008-04-15 | 1500 | Boston
3 | 2008-04-16 | 1800 | Islamabad
(2 rows)
Perform SELECT on sales_record_listpart to verify that the record is inserted as expected in the following manner:
warehouse_db=# SELECT * FROM sales_record_listpart;
id | sales_date | sales_amount | city
----+------------+--------------+-----------
1 | 2008-04-15 | 1200 | sydney
4 | 2008-04-20 | 1300 | new york
2 | 2008-04-15 | 1500 | Boston
3 | 2008-04-16 | 1800 | Islamabad
(4 rows)