0. Oracle测试数据准备:
[oracle@db1 ~]$ sqlplus system/000000
SQL*Plus: Release 11.2.0.3.0 Production on Tue Mar 25 10:26:06 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> drop table test ;
Table dropped.
SQL> create table test(id int,name varchar2(20),age int,msg varchar2(20));
Table created.
SQL> insert into test values(1,‘aaaaa‘,1,‘aaaaa‘);
1 row created.
SQL> insert into test values(2,‘bbbbb‘,2,‘bbbbb‘);
1 row created.
SQL> insert into test values(3,‘ccccc‘,3,‘ccccc‘);
1 row created.
SQL> insert into test values(4,‘ddddd‘,4,‘ddddd‘);
1 row created.
SQL> insert into test values(5,‘eeeee‘,5,‘eeeee‘);
1 row created.
SQL>
SQL> commit;
Commit complete.
SQL> select count(*) from test;
COUNT(*)
----------
5
SQL> INSERT INTO TEST SELECT * FROM TEST;
5 rows created.
SQL> commit;
Commit complete.
SQL> select count(*) from test;
COUNT(*)
----------
100000
SQL> select segment_name,bytes/1024/1024 from dba_segments where segment_name=‘TEST‘;
SEGMENT_NAME
--------------------------------------------------------------------------------
BYTES/1024/1024
---------------
TEST
3
模拟插入10W的数据。
1. 使用落地文件外部表加载测试
首先在oracle服务器端建数据导出,并开启gpfdist服务进程:
[oracle@db1 ~]$ gtlions.ora2text.bin user=system/000000 query=‘select * from test‘ text=csv file=test.sql fast=true
0 rows exported at 2014-03-25 10:30:19, size 0 MB.
100000 rows exported at 2014-03-25 10:30:20, size 2 MB.
output file test.sql closed at 100000 rows, size 2 MB.
[oracle@db1 ~]$ nohup gpfdist -d . -p 9999 &
[1] 15147
[oracle@db1 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
[1]+ Exit 1 nohup gpfdist -d . -p 9999
[oracle@db1 ~]$ ps -ef | grep gpfdist
oracle 15149 15068 0 10:30 pts/8 00:00:00 grep gpfdist
oracle 62994 62778 0 Mar24 ? 00:00:04 gpfdist -d . -p 9999
导出阶段耗时1S;
接下来在gp创建相关外部表并加载数据入库:
[gpadmin@bdb ~]$ psql postgres
Timing is on.
psql (8.2.15)
Type "help" for help.
postgres=# \timing on
Timing is on.
postgres=# drop table if exists gt_test;
create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
DROP TABLE
Time: 17.802 ms
postgres=# create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
drop external table if exists gt_test_ext;
CREATE TABLE
Time: 17.386 ms
postgres=# drop external table if exists gt_test_ext;
create external table gt_test_ext(like gt_test) location (‘gpfdist://192.168.1.2:9999/test.sql‘) format ‘csv‘ (header);
DROP EXTERNAL TABLE
Time: 6.734 ms
postgres=# create external table gt_test_ext(like gt_test) location (‘gpfdist://192.168.1.2:9999/test.sql‘) format ‘csv‘ (header);
NOTICE: HEADER means that each one of the data files has a header row.
CREATE EXTERNAL TABLE
Time: 13.562 ms
postgres=# insert into gt_test select * from gt_test_ext;
select count(*) from gt_test;
INSERT 0 100000
Time: 469.955 ms
postgres=# select count(*) from gt_test;
count
--------
100000
(1 row)
Time: 9.455 ms
insert阶段耗时0.174S;
一共耗时1S+0.469S=1.469S;
2. 不落地的外部表加载
在gp创建相关外部表并加载数据入库:
postgres=# \timing on
Timing is on.
postgres=# drop table if exists gt_test;
create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
DROP TABLE
Time: 17.969 ms
postgres=# create table gt_test(id int,name character varying(20),age int,msg character varying(20)) distributed randomly;
drop external table if exists gt_test_webext;
CREATE TABLE
Time: 16.960 ms
postgres=# drop external table if exists gt_test_webext;
DROP EXTERNAL TABLE
Time: 6.944 ms
postgres=# create external web table gt_test_webext(like gt_test) execute ‘sh /home/gtlions/oracle.sh‘ on master format ‘text‘ (delimiter ‘,‘);
CREATE EXTERNAL TABLE
Time: 9.508 ms
postgres=# insert into gt_test select * from gt_test_webext;
select count(*) from gt_test;
INSERT 0 100000
Time: 4253.003 ms
postgres=# select count(*) from gt_test;
count
--------
100000
(1 row)
Time: 9.867 ms
insert阶段耗时4.523S;
3. 加大数据量进行测试
50W 100W 200W 400W 800W 1600W 2000W
经过逐步测试,在50W记录以内,两者的差别不大,在2~3倍左右;
但是一旦记录增多,两者的差距逐步放大:
50W记录差距7倍;
100W记录差距12.9倍;
200W记录差距12.9倍;
500W记录差距13.5倍;
1000W记录差距15.2倍;
。。。。。。
不过在从100W到1亿过程中,时间差距并不是线性增长的。
3. 总结
对于小表来说,使用不落地的方式尚可接受;而对于超过25MB的大小的表速度上差距太大了。
-EOF-