ClickHouse-005建库建表以及数据导入导出测试
## 一、创建数据库
**语法:**
```
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE DATABASE testdb; //创建数据库
DROP DATABASE testdb; //删除数据库
```
## 二、建表
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
...
) ENGINE = engine
```
```
CREATE TABLE test_table(
province String,
province_name String,
create_date date
) ENGINE = MergeTree(create_date, (province), 8192);
```
**如果直接这样执行会报错,有两种方法解决:**
1: 在每一行后面加右斜杠,比如:
```
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn
CREATE TABLE test_table01( \
province String, \
province_name String, \
create_date date \
) ENGINE = MergeTree(create_date, (province), 8192);
```
2: 在登录的时候加 -m参数支持多行模式,比如:
```
clickhouse-client -m
[root@tidb06 ~]# clickhouse-client -udefault -m --password=j780UJy9D2tn
CREATE TABLE test_table(
province String,
province_name String,
create_date date
) ENGINE = MergeTree(create_date, (province), 8192);
```
**对建表sql的格式要求说明:**
ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。
create_date:是表的日期字段,一个表必须要有一个日期字段。
province:是表的主键,主键可以有多个字段,每个字段用逗号分隔
8192:是索引粒度,用默认值8192即可。
## 三、导入数据
### 3.1:普通的CSV文件导入
```
cat > test_table.csv << EOF
WA,WA_NAME,2020-08-25
CA,CA_NAME,2020-09-25
OR,OR_NAME,2020-10-25
EOF
```
**–-导数:**
```
clickhouse-client --query "INSERT INTO testdb.test_table FORMAT CSV" < test_table.csv;
```
**--或者用管道的方式:**
```
cat test_table.csv | clickhouse-client --query “INSERT INTO testdb.test_table FORMAT CSV”
```
**--测试演示:**
```
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table FORMAT CSV" < test_table.csv;
[root@tidb06 ~]#
[root@tidb06 ~]# cat test_table.csv | clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table01 FORMAT CSV"
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table limit 2";
WA WA_NAME 2020-08-25
CA CA_NAME 2020-09-25
[root@tidb06 ~]#
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table01 limit 2";
WA WA_NAME 2020-08-25
CA CA_NAME 2020-09-25
```
### 3.2:特殊的CSV文件导入(包含回车换行,转义符等)
**说明:下载clickhouse官方提供的测试log引擎的表数据进行测试**
这是Yandex.Metrica 日志收集的表分为点击hits和访问表vsits,可以作为日志埋点的重要参考数据
**数据文件下载连接:**
```
curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
```
**关于这2个表的数据的恢复的官方介绍地址:**
https://clickhouse.tech/docs/zh/getting-started/example-datasets/metrica/
**--建表:**
```
CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;"
```
**--导入测试数据:**
```
[root@tidb06 ~]# cat test_table3.csv
1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09
[root@tidb06 ~]# cat test_table3.csv| clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table02 FORMAT CSV"
[root@tidb06 ~]#
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table02 limit 2";
1 123.456 ”abc 123” ” abc" "\'123” 2020-08-26 2020-08-26 17:08:09
[root@tidb06 ~]#
```
**导入官方提供的测试数据:**
```
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
real 0m11.354s
user 0m17.004s
sys 0m0.890s
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
real 0m33.984s
user 0m54.570s
sys 0m2.363s
```
**数据导入完后,优化一下表**
```
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
real 0m20.816s
user 0m0.017s
sys 0m0.016s
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
real 0m18.757s
user 0m0.017s
sys 0m0.016s
```
**优化完之后,查看下表数据量**
```
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.hits_v1"
8873898
real 0m0.189s
user 0m0.018s
sys 0m0.015s
[root@tidb06 data1]#
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.visits_v1"
1676861
real 0m0.032s
user 0m0.017s
sys 0m0.014s
```
### 4.导出数据:
```
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from tutorial.visits_v1" > /data1/backup/tutorial.visits_v1.tsv
real 0m13.066s
user 0m11.303s
sys 0m1.652s
[root@tidb06 backup]# du -sh tutorial.visits_v1.tsv
2.5G tutorial.visits_v1.tsv
```
**4.1备份开启压缩:**
```
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from tutorial.hits_v1" |gzip >/data1/backup/tutorial.hits_v1.tsv.gz
real 3m49.556s
user 4m1.825s
sys 0m9.013s
```
[root@tidb06 backup]# du -sh tutorial.hits_v1.tsv.gz
1.3G tutorial.hits_v1.tsv.gz
**4.2CTAS表快照:**
```
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query=" create table tutorial.hits_v2 as tutorial.hits_v1"
real 0m0.088s
user 0m0.017s
sys 0m0.016s
[root@tidb06 ~]#
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="insert into table tutorial.hits_v2 select * from tutorial.hits_v1"
real 0m26.225s
user 0m0.024s
sys 0m0.025s
```
**4.3远程拷贝表数据到本地库:**
**授权tutorail 库给用户wujianwei,允许从172.16.0.246服务器来远程访问tidb06上clickhouse的tutorial库**
**具体权限配置文件如下:**
```
[root@tidb06 users.d]# cat /etc/clickhouse-server/users.d/wujianwei_rw.xml
c0952f7212b0161d07c6f45f00fdb73e17430f11 172.16.0.246 normal_2 default test008 tutorial 1
```
```
[root@tidb05 ~]# clickhouse-client --user=wujianwei -h 172.16.0.247 --password=j780UJy9D2tn --port=9000 --query "show databases;"
test008
tutorial
```
**创建同样的表结构表:**
```
[root@tidb05 ~]# clickhouse-client -udefault -m
tidb05 :) create database tutoria;
CREATE TABLE tutorial.hits_v1 ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested( Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192;
```
**远程copy数据:**
```
tidb05 :) insert into table tutorial.hits_v1 select * from remote ('172.16.0.247','tutorial.hits_v1','wujianwei','j780UJy9D2tn');
INSERT INTO tutorial.hits_v1 SELECT *
FROM remote('172.16.0.247', 'tutorial.hits_v1', 'wujianwei', 'j780UJy9D2tn')
Ok.
0 rows in set. Elapsed: 27.291 sec. Processed 8.87 million rows, 8.46 GB (325.15 thousand rows/s., 310.01 MB/s.)
tidb05 :) select count(*) from tutorial.hits_v1;
SELECT count(*)
FROM tutorial.hits_v1
┌─count()─┐
│ 8873898 │
└─────────┘
```
本次演示到此结束,欢迎一起交流和学习。