HybridDB PG中,如何输出单个文件到OSS

使用HybridDB PG的外部表输出数据到OSS时,一般会输出成多个文件(文件个数一般与节点数个数一致)。如何输出为一个文件呢?步骤如下:

1)创建示例表:

create table test(
  A1  char(30),
  A2  char(30),
  A3  varchar(255),
  A4  varchar(255)
)  with (appendonly=true, orientation=column, compresstype=zlib, checksum=false, fillfactor=100) distributed randomly;

2)插入数据:

insert into test values('xxxxxxx','yyyyy','zzzzz','wwwww');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;

3)创建外部表写表,按源数据表的结构创建外部表。注意:

  1. 相对于本地源表的表结构,需要添加一个临时字段,例如dummy_col,最好作为第一个字段。
  2. 添加output_generator_col=false
    file_name_generator_col=dummy_col这两个选项。file_name_generator_col这个参数指定a.中添加的字段。
create writable external table test_oss_write(
  dummy_col text,
  A1  text,
  A2  text,
  A3  text,
  A4  text
)
location('oss://oss-cn-shanghai-internal.aliyuncs.com
dir=oncall_test/
id=xx
key=xx
bucket=osshuadong2
oss_connect_timeout=60
oss_dns_cache_timeout=300
output_generator_col=false
file_name_generator_col=dummy_col
') FORMAT 'csv'  ( DELIMITER ',') 
distributed by (dummy_col)
;

4) 将test数据写入到oss中。这里为了避开我们的一些limitation,写成了如下形式,将A1-A4和表test换成用户的字段和表名即可。

set rds_write_oss_file_by_distribution_column=on;   


explain insert into test_oss_write select x, A1, A2, A3, A4 from (select row_number() over (partition by dummy_col||dummy_col) as RN, dummy_col||dummy_col as x, A1, A2, A3, A4 from (select y as dummy_col, A1, A2, A3, A4 from test, (values(1)) as q1(y)) as q2 ) as q3;
上一篇:Hadoop新手篇:hadoop入门基础教程


下一篇:大快DKhadoop开发环境安装常见问题及解决方案汇总