系统环境:centos6.5
oracle版本:oracle11gr2
由于客户导出的格式是csv格式的,并且数据量比较大,两千多万,使用plsql不太合适。考虑使用oracle客户端工具sqlldr。
如果提供的有表结构最好,直接导入创建表就行了。如果没有提供,可以先使用wps打开csv文件,根据内容自己去创建表结构。
使用sqlldr有几种导入方式,这里使用direct直接路径加并行的模式。
- 先在数据库创建好空表。
- 创建ctl文件
options(skip=1,columnarrayrows=20971520,ROWS=10000,READSIZE=20971520,ERRORS=999999999) load data infile '/ora11/tmp/zz.csv' append into table "*****_RECORD" fields terminated by ',' Optionally enclosed by '"' (id,record_organ_code,record_organ_name)
1行是导入参数配置,
已经是支持大量数据导入的参数方案。
第3行infile指定导入的文件是USER_INFO.csv;
第4行into table前面的insert表示导入方式:
insert :默认方式,在导入记录前要求表为空;
append :在表中追加新导入的记录;
replace :删除旧记录(等价delete from table语句),替换成新导入的记录;
truncate:删除旧记录(等价truncate table语句),替换成新导入的记录;
into table后面指定导入数据库表USER_INFO,
且表名必须大写;
第5行指定每一行的字段是以逗号(,)分隔;
第6行指定字段是用两个分号(')包围起来的,可选的;
最后一行对应导入的字段,
注意如果导入的是时间字段,
需要指明时间转换的格式(在这里有坑,要注意格式。本人在实施的过程中已经被坑过了。格式不对会报
SQL*Loader-951: Error calling once/load initialization ORA-02373: Error parsing insert statement for table ESNS."JWXZZ_COVID_RECORD". ORA-00936: missing expression
)。
3、使用sqlldr导入
sqlldr userid=esns/***** control=/ora11/tmp/loadfile_csv.ctl direct=true parallel=true
4、导入的相关日志
SQL*Loader: Release 11.2.0.1.0 - Production on Sun Feb 27 23:39:42 2022 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Control File: /ora11/tmp/loadfile_csv.ctl Data File: /ora11/tmp/zz.csv Bad File: /ora11/tmp/zz.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 1 Errors allowed: 999999999 Continuation: none specified Path used: Direct - with parallel option. Table "JWXZZ_COVID_RECORD", loaded from every logical record. Insert option in effect for this table: APPEND Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- ID FIRST * , O(") CHARACTER RECORD_ORGAN_CODE NEXT * , O(") CHARACTER RECORD_ORGAN_NAME NEXT * , O(") CHARACTER CHILD_CODE NEXT * , O(") CHARACTER CHILD_NAME NEXT * , O(") CHARACTER GENDAR NEXT * , O(") CHARACTER SQL*Loader-281: Warning: ROWS parameter ignored in parallel mode. Table "********_RECORD": 25730788 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. Bind array size not used in direct path. Column array rows : 10000 Stream buffer bytes: 256000 Read buffer bytes:20971520 Total logical records skipped: 1 Total logical records read: 25730788 Total logical records rejected: 0 Total logical records discarded: 0 Total stream buffers loaded by SQL*Loader main thread: 2786 Total stream buffers loaded by SQL*Loader load thread: 24336 Run began on Sun Feb 27 23:39:42 2022 Run ended on Sun Feb 27 23:43:52 2022 Elapsed time was: 00:04:09.46 CPU time was: 00:02:55.86
参考文档&感谢:https://blog.csdn.net/bugzeroman/article/details/103031317