Greenplum6 数据库分布
- hash分布
- 随机分布
- 复制分布
基本语法介绍
- 获取语法
[gpadmin@mdw logs]$ psql
psql (9.4.24)
Type "help" for help.
postgres=# \h
Available help:
ABORT ALTER TEXT SEARCH TEMPLATE CREATE RESOURCE GROUP DROP FUNCTION LOAD
ALTER AGGREGATE ALTER TRIGGER CREATE RESOURCE QUEUE DROP GROUP LOCK
ALTER COLLATION ALTER TYPE CREATE ROLE DROP INDEX MOVE
ALTER CONVERSION ALTER USER CREATE RULE DROP LANGUAGE NOTIFY
ALTER DATABASE ALTER USER MAPPING CREATE SCHEMA DROP MATERIALIZED VIEW PREPARE
ALTER DEFAULT PRIVILEGES ALTER VIEW CREATE SEQUENCE DROP OPERATOR PREPARE TRANSACTION
ALTER DOMAIN ANALYZE CREATE SERVER DROP OPERATOR CLASS REASSIGN OWNED
ALTER EVENT TRIGGER BEGIN CREATE TABLE DROP OPERATOR FAMILY REFRESH MATERIALIZED VIEW
ALTER EXTENSION CHECKPOINT CREATE TABLE AS DROP OWNED REINDEX
ALTER EXTERNAL TABLE CLOSE CREATE TABLESPACE DROP PROTOCOL RELEASE SAVEPOINT
ALTER FOREIGN DATA WRAPPER CLUSTER CREATE TEXT SEARCH CONFIGURATION DROP RESOURCE GROUP RESET
ALTER FOREIGN TABLE COMMENT CREATE TEXT SEARCH DICTIONARY DROP RESOURCE QUEUE REVOKE
ALTER FUNCTION COMMIT CREATE TEXT SEARCH PARSER DROP ROLE ROLLBACK
ALTER GROUP COMMIT PREPARED CREATE TEXT SEARCH TEMPLATE DROP RULE ROLLBACK PREPARED
ALTER INDEX COPY CREATE TRIGGER DROP SCHEMA ROLLBACK TO SAVEPOINT
ALTER LANGUAGE CREATE AGGREGATE CREATE TYPE DROP SEQUENCE SAVEPOINT
ALTER LARGE OBJECT CREATE CAST CREATE USER DROP SERVER SECURITY LABEL
ALTER MATERIALIZED VIEW CREATE COLLATION CREATE USER MAPPING DROP TABLE SELECT
ALTER OPERATOR CREATE CONVERSION CREATE VIEW DROP TABLESPACE SELECT INTO
ALTER OPERATOR CLASS CREATE DATABASE DEALLOCATE DROP TEXT SEARCH CONFIGURATION SET
ALTER OPERATOR FAMILY CREATE DOMAIN DECLARE DROP TEXT SEARCH DICTIONARY SET CONSTRAINTS
ALTER PROTOCOL CREATE EVENT TRIGGER DELETE DROP TEXT SEARCH PARSER SET ROLE
ALTER RESOURCE GROUP CREATE EXTENSION DISCARD DROP TEXT SEARCH TEMPLATE SET SESSION AUTHORIZATION
ALTER RESOURCE QUEUE CREATE EXTERNAL TABLE DO DROP TRIGGER SET TRANSACTION
ALTER ROLE CREATE FOREIGN DATA WRAPPER DROP AGGREGATE DROP TYPE SHOW
ALTER RULE CREATE FOREIGN TABLE DROP CAST DROP USER START TRANSACTION
ALTER SCHEMA CREATE FUNCTION DROP COLLATION DROP USER MAPPING TABLE
ALTER SEQUENCE CREATE GROUP DROP CONVERSION DROP VIEW TRUNCATE
ALTER SERVER CREATE INDEX DROP DATABASE END UNLISTEN
ALTER SYSTEM CREATE LANGUAGE DROP DOMAIN EXECUTE UPDATE
ALTER TABLE CREATE MATERIALIZED VIEW DROP EVENT TRIGGER EXPLAIN VACUUM
ALTER TABLESPACE CREATE OPERATOR DROP EXTENSION FETCH VALUES
ALTER TEXT SEARCH CONFIGURATION CREATE OPERATOR CLASS DROP EXTERNAL TABLE GRANT WITH
ALTER TEXT SEARCH DICTIONARY CREATE OPERATOR FAMILY DROP FOREIGN DATA WRAPPER INSERT
ALTER TEXT SEARCH PARSER CREATE PROTOCOL DROP FOREIGN TABLE LISTEN
postgres=#
- 创建数据库
[gpadmin@mdw logs]$ createdb testDB -E utf-8 [gpadmin@mdw logs]$ psql -h 10.10.10.101 -p 5432 -d testDB -U gpadmin psql (9.4.24) Type "help" for help.
testDB=# \q
[gpadmin@mdw logs]$ export PGDATABASE=testDB
[gpadmin@mdw logs]$ psql
psql (9.4.24)
Type "help" for help.
testDB=#
3. 建表语句
- GreenPlum中创建表需要指定表的分布键。
- 如果表需要用某个字段分区,可以通过partition by 将表建成分区表。
- 可以使用like操作创建与like的表一样结构的表,功能类似create table t1 as select * from t2 limit 0。
- 可以使用inherits实现表的继承,具体实现参考postgresql文档。
--语法查询
\h create table
--创建表
create table test001(id int,name varchar(128)); --id 为分布键
create table test002(id int,name varchar(128)) distributed by (id); --同上
testDB=# create table test001(id int,name varchar(128));
NOTICE: Table doesn‘t have ‘DISTRIBUTED BY‘ clause -- Using column named ‘id‘ as the Greenplum Database data distribution key for this table.
HINT: The ‘DISTRIBUTED BY‘ clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
testDB=# create table test002(id int,name varchar(128)) distributed by (id);
CREATE TABLE
testDB=#
create table test003(id int,name varchar(128)) distributed by (id,name) --多个分布键
testDB=# create table test003(id int,name varchar(128)) distributed by (id,name);
CREATE TABLE
create table test004(id int,name varchar(128)) distributed randomly; --随机分布键
testDB=# create table test004(id int,name varchar(128)) distributed randomly;
CREATE TABLE
create table test005(id int primary key ,name varchar(128));
create table test006(id int unique ,name varchar(128));
testDB=# create table test005(id int primary key ,name varchar(128));
CREATE TABLE
testDB=# create table test006(id int unique ,name varchar(128));
CREATE TABLE
testDB=#
采用随机分布策略的表默认将主键,或者唯一键作为分布键,因为每个Segment都是一个单一的数据库,单个数据库可以确保唯一性,多个数据库节点就无法保证全局的跨库唯一性,故只能按照唯一键分布,同一个值的数据都在一个节点上,以此来保证唯一性。
--如果指定的分布键与主键盘不一样,那么分布键会被更改为主键。在greenplum6 中这句话貌似显示不兼容,如下:
create table test007(id int unique,name varchar(128)) distributed by (id,name);
testDB=# create table test007(id int unique,name varchar(128)) distributed by (id,name);
ERROR: UNIQUE constraint and DISTRIBUTED BY definitions are incompatible
HINT: When there is both a UNIQUE constraint and a DISTRIBUTED BY clause, the DISTRIBUTED BY clause must be a subset of the UNIQUE constraint.
testDB=#
按照它的要求进行修改,我们distribute by 修改成id
testDB=# create table test007(id int unique,name varchar(128)) distributed by (id);
CREATE TABLE
testDB=#
---创建一模一样的 表
create table test_like (like test001);
--使用like 创建表的时候,只是表结构会跟原表一摸一样,表的特殊属性并不会一样,例如压缩,只增(appendonly)等属性,如果不指定分布键,默认分布键与原表一致。
testDB=# create table test_like (like test001);
NOTICE: table doesn‘t have ‘DISTRIBUTED BY‘ clause, defaulting to distribution columns from LIKE table
CREATE TABLE
testDB=#
4. 插入语句
插入语句
执行insert语句注意分布键不要为空,否则分布键默认变成null‘,数据都被保存到一个节点上会导致分布不均
insert into test001 values (100,‘tom‘),(101,‘lily‘),(102,‘jack‘),(103,‘linda‘);
insert into test002 values (200,‘tom‘),(101,‘lily‘),(202,‘jack‘),(103,‘linda‘);
5. 更新语句
不能批量对分布键执行update,因为分布键执行update需要将数据重分布.
testDB=# update test002 set id=203 where id=202;
UPDATE 1
testDB=#
6. 删除语句delete--truncate
在Greenplum 3.x 的版本中,如果delete 操作涉及子查询,子查询的结果还涉及到数据重分布,这样的删除语句会报错,Greenplum 4.x以上,支持该操作。
testDB=# delete from test001 where name in (select name from test002);
DELETE 4
testDB=#
如果对整张表执行delete会比较慢,建议使用truncate.
truncate执行truncate直接删除表的物理文件,然后创建新的数据文件。如果有sql正在操作这张表,那么truncate会被锁住,直到表上面的所有锁会被释放。
7. 查询语句
testDB=# select * from test001 x,test002 y where x.id=y.id;
id | name | id | name
-----+-------+-----+-------
103 | linda | 103 | linda
101 | lily | 101 | lily
(2 rows)
8. 执行计划
testDB=# explain select * from test001 x,test002 y where x.id=y.id;
QUERY PLAN
Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..862.00 rows=5 width=18)
-> Hash Join (cost=0.00..862.00 rows=1 width=18)
Hash Cond: (test001.id = test002.id)
-> Seq Scan on test001 (cost=0.00..431.00 rows=1 width=9)
-> Hash (cost=431.00..431.00 rows=1 width=9)
-> Seq Scan on test002 (cost=0.00..431.00 rows=1 width=9)
Optimizer: Pivotal Optimizer (GPORCA)
(7 rows)