一、介绍
pgloader是一款PostgreSQL数据迁移工具,最初只用于支持格式化文件的数据快速导入到PostgreSQL。pgloader支持跳过无法导入的出错数据并进行记录,因此在源数据文件有部份错误数据的情况下依然可以继续完成迁移任务,节省迁移排错后重复导入的时间损耗。最新的版本中,还支持直接将SQLite、MySQL、MS SQL Server数据库作为数据源,进行数据的直接导入,并针对不同数据类型进行自动转换,甚至还会针对不同数据库特性完成自动替换,遗憾的是,pgloader无法支持从源端到目标端的增量复制,也就是说无法用于生产环境的割接中。但pgloader的出现,使得PostgreSQL开发人员可以十分方便的进行数据库迁移转换,对于在新项目中由于功能及开放性需求要使用PostgreSQL的业务来说,pgloader依然是数据迁移的绝佳利器。
二、工作原理
pgloader是PostgreSQL的一个数据加载工具,使用COPY命令。pgloader v1 是 Tcl 写的,pgloader v2 是 Python 写的,pgloader v3 是 Common Lisp 写的。
三、安装
3.1、安装依赖包
yum install unzip libsqlite3-dev make curl gawk freetds-dev freetds libzip-dev
# 需要安装sbcl;且版本>=1.2.5 http://www.sbcl.org/getting.html
wget https://sourceforge.net/projects/sbcl/files/sbcl/1.5.2/sbcl-1.5.2-x86-64-linux-binary.tar.bz2
bzip2 -cd sbcl-1.5.2-x86-64-linux-binary.tar.bz2 | tar xvf -
cd sbcl-1.5.2-x86-64-linux
./install.sh
ln -s /usr/local/bin/sbcl /usr/bin/sbcl
3.2、安装pgloader
cd /path/to/pgloader
make pgloader
./build/bin/pgloader --help
四、命令语法
LOAD <source-type>
FROM <source-url>
[ HAVING FIELDS <source-level-options> ]
INTO <postgresql-url>
[ TARGET TABLE [ "<schema>" ]."<table name>" ]
[ TARGET COLUMNS <columns-and-options> ] [ WITH <load-options> ] [ SET <postgresql-settings> ][ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
[ AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
;
五、示例
--https://pgloader.readthedocs.io/en/latest/tutorial/tutorial.html
5.1、pgloader加载csv文件
csv文件:
Header, with a © sign
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States"
控制文件示例
LOAD CSV
FROM '/home/postgres/file.csv' (x, y, a, b, c, d)
INTO postgresql://lottu@ip:5432/lottu?csv (a, b, d, c)
WITH truncate,
skip header = 1,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
SET client_encoding to 'utf8',
work_mem to '12MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists csv; $$,
$$ create table csv (
a bigint,
b bigint,
c char(2),
d text
);
$$;
单独使用copy命令加载csv文件
lottu=> \copy csv2 from '/home/postgres/file2.csv' with (format csv);
优势:
- 可以过滤行数
- 可以定制插入列数
- 可以在加载前执行创建表、删除索引操作;也可以在加载后执行创建索引。即可扩展
5.2、加载mysql数据库
连接mysql语法db://user:pass****@host:port/dbname
mysql://[user[:password]@][netloc][:port][/dbname]
控制文件示例
load database
from mysql://system:li0924@ip:3306/lottu
into postgresql://lottu@ip:5432/lottuWITH include drop, create tables, no truncate,
create indexes, reset sequences, foreign keysSET maintenance_work_mem to '128MB', work_mem to '12MB', search_path to 'lottu'
BEFORE LOAD DO
$$ create schema if not exists lottu; $$;
5.3、加载固定宽度字符串
加载文件内容
01234567892008052011431250firstline
01234562008052115182300left blank-padded
12345678902008052208231560another line
2345609872014092914371500
2345678902014092914371520
pgloader语法
LOAD FIXED
FROM '/home/postgres/fixed.file'
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
INTO postgresql://lottu@123.59.16.166:5432/lottu
TARGET TABLE fixed
(
a, b,
c time using (time-with-no-separator c),
d
) WITH truncate SET work_mem to '14MB',
standard_conforming_strings to 'on'BEFORE LOAD DO
$$ drop table if exists fixed; $$,
$$ create table fixed (
a integer,
b date,
c time,
d text
);
$$;
5.4、加载压缩文件
LOAD ARCHIVE
FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
INTO postgresql:///ip4r BEFORE LOAD
DO $$ create extension if not exists ip4r; $$,
$$ create schema if not exists geolite; $$, EXECUTE 'geolite.sql' LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
WITH ENCODING iso-8859-1
(
locId,
country,
region null if blanks,
city null if blanks,
postalCode null if blanks,
latitude,
longitude,
metroCode null if blanks,
areaCode null if blanks
)
INTO postgresql:///ip4r?geolite.location
(
locid,country,region,city,postalCode,
location point using (format nil "(~a,~a)" longitude latitude),
metroCode,areaCode
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ',' AND LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
WITH ENCODING iso-8859-1
(
startIpNum, endIpNum, locId
)
INTO postgresql:///ip4r?geolite.blocks
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ',' FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;