-
ClickHouse
-
Overview
ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
ClickHouse is designed to work on regular hard drives, which means the cose per GB of daata storage is low, but SSD and additional RAM are also fully used if avaiable.
ClcikHouse supports a declarative query language based on SQL that is identical to the ANSI SQL standard in many cases.
《理解declarative query language based on SQL vs. ANSI SQL 》
-
概念理解
-
The data access scenario
The data access scenario refers to :
- what queries are made, how often, and in what proportion;
- how much data is read for each type of query - rows, columns, and bytes;
- the relationship between reading and updating data;
- the working size of the data and how locally it is used;
- whether transactions are used;
- how isolated they are;
- requirements for data replication and logical integrity;
- requirements for latency and throughput for each type of query
ClickHouse is designed for online analytical processing of queries (OLAP)
-
Column-Oriented Database Management System
Data in columns is also easier to compress.
In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values.
-
Data Compression
ClickHouse provides specialized codecs for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
-
Throughput For a Single Large Query
Throughput can be measured in rows per second or megabytes per second.
-
If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10GB/s of uncompressed data on a single server
-
If data is not placed in the page cache, the speed depends on the disk subsystem and the data compression rate. For example, if the disk subsystem allows reading data at 400MB/s, and the data compression rate is 3,the speed is expected to be around 1.2GB/s. To get he speed in rows per second, divide the speed in bytes per second by the total size of columns used in the query. For example, if 10 bytes of columns are extracted, the speed is expected to be around 100-200 million rows per second.
1.2 G ∗ 1024 M ∗ 1024 K ∗ 1024 B y t e s 10 B y t e s = 128 , 849 , 018.88 = 128 M i l l i o n \frac{1.2G*1024M*1024K*1024Bytes}{10Bytes}=128,849,018.88=128Million 10Bytes1.2G∗1024M∗1024K∗1024Bytes=128,849,018.88=128Million
-
-
Aggregated and Non-aggregated Data
There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data.
-
History
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily.
A single query may require scanning millions of rows within a frw hundred milliseconds, or hundreds of millions of rows in just a few seconds.
-
典型客户
金融领域:Citadel Securities Bloomberg ICA
国内客户:Bytedance字节跳动 CraiditX氪信 Dataliance for China Telecom中国电信 HUYA虎牙直播 Kuaishou快手 Jinshuju 金数据 OneAPM Percent 百分点 QINGCLOUD Sina新浪 Suning苏宁 Tencent腾讯 Xiaoxin Tech小黑板 Ximalaya喜马拉雅
-
Installation
通过Docker安装:
# server docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server # client docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
-
Tutorial
相关文章
- 02-03机器学习笔记(二)
- 02-03Python学习3月10号【python编程 从入门到实践】---》笔记
- 02-03今天来学习一下RESPONSE
- 02-03【Stage3D学习笔记续】山寨Starling(五):纹理计算和尺寸计算
- 02-03学习笔记(01):8小时学会HTML网页开发-附录⑤ oveflow溢出处理
- 02-03Django学习-8-模板渲染的一些特性
- 02-03《吴恩达深度学习》学习笔记009_机器学习策略(2)(ML Strategy (2))
- 02-03【ML】神经网络学习笔记_01 人工神经元、多层感知机、激活函数
- 02-032021年机器学习的下一步是什么?
- 02-03ML/DL-复习笔记【一】- 数学基础(线性代数、概率论、数值分析)