ClickHouse学习笔记一

  • ClickHouse

    Tutorial Documents

    初识Yandex||ClickHouse

  • Overview

    ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.

    ClickHouse is designed to work on regular hard drives, which means the cose per GB of daata storage is low, but SSD and additional RAM are also fully used if avaiable.

    ClcikHouse supports a declarative query language based on SQL that is identical to the ANSI SQL standard in many cases.

    《理解declarative query language based on SQL vs. ANSI SQL 》

  • 概念理解

  • The data access scenario

    The data access scenario refers to :

    1. what queries are made, how often, and in what proportion;
    2. how much data is read for each type of query - rows, columns, and bytes;
    3. the relationship between reading and updating data;
    4. the working size of the data and how locally it is used;
    5. whether transactions are used;
    6. how isolated they are;
    7. requirements for data replication and logical integrity;
    8. requirements for latency and throughput for each type of query

    ClickHouse is designed for online analytical processing of queries (OLAP)

  • Column-Oriented Database Management System

    Data in columns is also easier to compress.

    In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values.

  • Data Compression

    ClickHouse provides specialized codecs for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.

  • Throughput For a Single Large Query

    Throughput can be measured in rows per second or megabytes per second.

    • If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10GB/s of uncompressed data on a single server

    • If data is not placed in the page cache, the speed depends on the disk subsystem and the data compression rate. For example, if the disk subsystem allows reading data at 400MB/s, and the data compression rate is 3,the speed is expected to be around 1.2GB/s. To get he speed in rows per second, divide the speed in bytes per second by the total size of columns used in the query. For example, if 10 bytes of columns are extracted, the speed is expected to be around 100-200 million rows per second.
      1.2 G ∗ 1024 M ∗ 1024 K ∗ 1024 B y t e s 10 B y t e s = 128 , 849 , 018.88 = 128 M i l l i o n \frac{1.2G*1024M*1024K*1024Bytes}{10Bytes}=128,849,018.88=128Million 10Bytes1.2G∗1024M∗1024K∗1024Bytes​=128,849,018.88=128Million

  • Aggregated and Non-aggregated Data

    There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data.

  • History

    As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily.

    A single query may require scanning millions of rows within a frw hundred milliseconds, or hundreds of millions of rows in just a few seconds.

  • 典型客户

    金融领域:Citadel Securities Bloomberg ICA

    国内客户:Bytedance字节跳动 CraiditX氪信 Dataliance for China Telecom中国电信 HUYA虎牙直播 Kuaishou快手 Jinshuju 金数据 OneAPM Percent 百分点 QINGCLOUD Sina新浪 Suning苏宁 Tencent腾讯 Xiaoxin Tech小黑板 Ximalaya喜马拉雅

  • Installation

    理解SSE4.2

    通过Docker安装:

    # server
    docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
    # client
    docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
    
  • Tutorial

上一篇:vue 实现 广告悬浮效果,并防抖性能优化


下一篇:从DDR到DDR4,内存核心频率其实基本上就没太大的进步