influxdb 简单实践

InfluxDB是一个当下比较流行的时序数据库,InfluxDB使用 Go 语言编写,无需外部依赖,安装配置非常方便,适合构建大型分布式系统的监控系统。

1 下载安装

wget https://dl.influxdata.com/influxdb/releases/influxdb-1.4.3_linux_amd64.tar.gz
tar xvfz influxdb-1.4.3_linux_amd64.tar.gz
mv influxdb-1.4.3_linux_amd64 ~/disk/influxdb14

启动守护进程
cd ~/disk/influxdb14/usr/bin
./influxd &
创建管理员用户
./influx
show users
create user fsj with password 'fsj' with all privileges

将配置文件中auth-enabled字段修改为true
重启服务
service influxdb restart
重新登录
./influx -username fsj -password fsj

2 配置

查看当前配置 influxd config

设置密码

Enable authentication by setting the auth-enabled option to true in the [http] section of the configuration file

$ cat run_influxd.sh 
app=/home/work/workspace/apps
log=/home/work/log/influxdb
influxd=$app/influxdb14/usr/bin/influxd
conf=$app/influxdb14/etc/influxdb/influxdb.conf
$influxd -config $conf 1>$log/stdout 2>$log/stderr & # -config指定配置文件

ref https://kiswo.com/article/1020

3 basic usage

参考quick start

influx -precision rfc3339 -username fsj -password fsj
CREATE DATABASE mydb
SHOW DATABASES
> show measurements;  
name: measurements
name
----
TableTest
> select * from TableTest limit 10  # 大小写敏感
name: TableTest
time                App       Area          Cid ContentId FloorId FloorName Imei                                     Page   count
----                ---       ----          --- --------- ------- --------- ----             -----             -----
1520921431000000000 TMall  list          0             271     居家        863276004580322         $Home$       1
1520921431000000000 JD banner                      77      首页    352042013052762         $Home$          1
...

4 聚合函数

COUNT()
Returns the number of non-null values in a single field.

5 连续查询

查看CQ
SHOW CONTINUOUS QUERIES


name: TableTest_stat
name      query
----      -----
pv        CREATE CONTINUOUS QUERY pv ON TableTest_stat BEGIN SELECT sum(count) INTO TableTest_stat.autogen.TableTest_pv FROM TableTest_stat.autogen.TableTest GROUP BY time(30m) END
pv_day    CREATE CONTINUOUS QUERY pv_day ON TableTest_stat BEGIN SELECT sum(count) INTO TableTest_stat.autogen.TableTest_pv_day FROM TableTest_stat.autogen.TableTest GROUP BY time(1d) END
pv_hour   CREATE CONTINUOUS QUERY pv_hour ON TableTest_stat BEGIN SELECT sum(count) INTO TableTest_stat.autogen.TableTest_pv_hour FROM TableTest_stat.autogen.TableTest GROUP BY time(1h) END
pv_minute CREATE CONTINUOUS QUERY pv_minute ON TableTest_stat BEGIN SELECT sum(count) INTO TableTest_stat.autogen.TableTest_pv_minute FROM TableTest_stat.autogen.TableTest GROUP BY time(1m) END
uv_day    CREATE CONTINUOUS QUERY uv_day ON TableTest_stat BEGIN SELECT count(distinct(Imei)) INTO TableTest_stat.autogen.TableTest_uv_day FROM TableTest_stat.autogen.TableTest GROUP BY time(1d) END


创建CQ

CREATE CONTINUOUS QUERY pv_all_1h ON TableTest_stat BEGIN SELECT sum(count) INTO "h.pv.all.1h"   FROM TableTest  GROUP BY time(1m),App,Area,Cid,ContentId,FloorId,FloorName,Imei END
CREATE CONTINUOUS QUERY uv_all_1d ON HomeVisit_stat BEGIN SELECT count( distinct(Uid)) INTO "h.uv.all.1d"   FROM TableTest  GROUP BY time(1d),App,Area,Cid,ContentId,FloorId,FloorName,Imei END
创建之后INTO "h.pv.all.1h"会自动变成 INTO TableTest_stat.autogen."h.pv.all.1h"



删除CQ
drop continuous query pv_all_1m on TableTest_stat;

填充CQ创建之前的记录

聚合group by

all_1d CREATE CONTINUOUS QUERY all_1d ON TableTest_stat BEGIN SELECT count(distinct(Imei)) as CDImei, sum(count) as SCount  INTO TableTest_stat.autogen."h.all.1d" FROM TableTest_stat."10day".TableTest GROUP BY time(1d), App, Area END
all_1h CREATE CONTINUOUS QUERY all_1h ON TableTest_stat BEGIN SELECT count(distinct(Imei)) as CDImei, sum(count) as SCount  INTO TableTest_stat.autogen."h.all.1h" FROM TableTest_stat."10day".TableTest GROUP BY time(1d), App, Area END

6 数据保存策略

InfluxDB没有提供直接删除Points的方法,但是它提供了Retention Policies。主要用于指定数据的保留时间:当数据超过了指定的时间之后,就会被删除。

新建存储策略

>  CREATE RETENTION POLICY "10day" ON "TableTest_stat" DURATION 240h REPLICATION 1 DEFAULT
> SHOW RETENTION POLICIES
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false
10day   240h0m0s 24h0m0s            1        true

> ALTER RETENTION POLICY "10day" ON TableTest_stat SHARD DURATION 1w 
> SHOW RETENTION POLICIES
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false
10day   240h0m0s 168h0m0s           1        true

默认autogen的duration为0表示永久。
那shardGroupDuration对数据保存有什么影响呢?

注意,存储策略有点类似于分区数据块,修改了策略,新策略里不会有旧策略的数据。
要想查看旧策略下的数据,需要在 measurement 前加上策略名称。

> select * from "h.pv.all.1h" limit 10;
> select * from "autogen"."h.pv.all.1h" limit 10;
name: homevisit.pv.all.1h
....

influxdb 也支持通过 http方式写入, see also https://*.com/questions/37729008/can-i-create-different-retention-policy-for-different-measurements-in-influxdb

  1. http://www.oznetnerd.com/influxdb-retention-policies-shard-groups/
  2. https://www.linuxdaxue.com/retention-policies-in-influxdb.html

7 tag

> show series from TableTest limit 10;
key
---
TableTest,Action=Pay,App=JD,Area=list,Cid=0,FloorId=1,FloorName=办公,Page=Home,Rid=4
TableTest,Action=Pay,App=TM,Area=list,Cid=0,FloorId=2,FloorName=运动,Page=Home,Rid=1
...

> show tag keys from TableTest;
name: TableTest
tagKey
------
Action
App
Area
Cid
ContentId
FloorId
FloorName
Page
Rid
Sid

> show tag values from TableTest with key="App" limit 30;
name: HomeVisit
key value
--- -----
App JD
App TM
...


> show tag values from TableTest with key="Area" limit 30;
name: HomeVisit
key  value
---  -----
Area list
Area banner
...

8 Field

相当于实际记录的数据值,也是采用key=value形式,多个 tag 之间用 ',' 分隔。

> show field keys
name: TableTest
fieldKey fieldType
-------- ---------
Imei     string
count    integer

field列不能用在group by后面。

当列按照filed写入后,改成按tag写入,会使得改列即是filed又是tag。还是不能用在group by后面。

只能删表重建。

9 查询实战

select count(Imei) as PV,count(distinct(Imei)) as UV from TableTest where (time>=1521993600000000000 and time <1522080000000000000) and ((App ='JD')) and Action = 'View' group by App,time(86400s) tz('Asia/Shanghai')
 select count(distinct(Imei)) AS CDImei, sum(count) AS SCount from TableTest where App='JD' and time>'2018-03-29' tz('Asia/Shanghai')

where条件中,要通过tag筛选必须加单引号,不能双引号

> select time,App,Action,count from TableTest where Action="View" order by time desc  limit 10;

从命令行查询
$ influx -precision rfc3339 -username admin -password x -database HomeVisit_stat -execute "YOUR SQL" -format=csv

10 进阶

10.1 选型

Influxdb vs Prometheus
influxdb集成已有的概念,比如查询语法类似sql,引擎从LSM优化而来,学习成本相对低。

influxdb支持的类型有float,integers,strings,booleans,prometheus目前只支持float。

influxdb的时间精度是纳秒,prometheus的则是毫秒。

influxdb仅仅是个数据库,而prometheus提供的是整套监控解决方案,当然influxdb也提供了整套监控解决方案。

influxdb支持的math function比较少,prometheus相对来说更多,influxdb就目前使用上已经满足功能。

2015年prometheus还在开发阶段,相对来说influxdb更加稳定。

influxdb支持event log,prometheus不支持。

更详细的对比请参考:

https://db-engines.com/en/system/Graphite%3BInfluxDB%3BPrometheus

open source influxdb cluster http://mysql.taobao.org/monthly/2018/02/02/

10.2 架构

influxdb 简单实践

measurement, tag set, retention policy相同的数据集合算做一个 series。理解这个概念至关重要,因为这些数据存储在内存中,如果series太多,会导致OOM。

influxdb 简单实践

influxdb 简单实践

不考虑PR,也可以说series = measurement + tags

插入一条记录到新表:INSERT Cpu,host=serverA,region=us_west value=0.64

也是 insert measurement,tags field 的格式

11.3 shardGroupDuration对数据保存的影响

http://www.oznetnerd.com/influxdb-retention-policies-shard-groups/

Shard 存储一定时间间隔的数据,每个目录对应一个shard,目录的名字就是shard id。每一个shard都有自己的cache、wal、tsm file以及compactor,目的就是通过时间来快速定位到要查询数据的相关资源,加速查询的过程,并且也让之后的批量删除数据的操作变得非常简单且高效。

Shard Group 是shard的逻辑容器。每一个有数据的RP至少有一个关联的shard group,
一个shard group覆盖的时间范围由RP里的SHARD DURATION参数决定。

shard duration的默认值

Retention Policy’s DURATION Shard Group Duration
< 2 days 1 hour

= 2 days and <= 6 months 1 day
6 months 7 days
较小的shard group duration有助于系统更高效的删数据。

假如RP duration是1d,shard group duration是1h,那么系统每个小时都会删一个shar group

test case

> show retention policies;
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        false
d1      24h0m0s  1h0m0s             1        true
> show shard groups;
name: shard groups
id database       retention_policy start_time           end_time             expiry_time

2  mydb           autogen          2018-03-05T00:00:00Z 2018-03-12T00:00:00Z 2018-03-12T00:00:00Z
80 mydb           autogen          2018-05-28T00:00:00Z 2018-06-04T00:00:00Z 2018-06-04T00:00:00Z
81 mydb           d1               2018-05-29T08:00:00Z 2018-05-29T09:00:00Z 2018-05-30T09:00:00Z
82 mydb           d1               2018-05-29T09:00:00Z 2018-05-29T10:00:00Z 2018-05-30T10:00:00Z
83 mydb           d1               2018-05-29T10:00:00Z 2018-05-29T11:00:00Z 2018-05-30T11:00:00Z
84 mydb           d1               2018-05-29T11:00:00Z 2018-05-29T12:00:00Z 2018-05-30T12:00:00Z
85 mydb           d1               2018-05-29T12:00:00Z 2018-05-29T13:00:00Z 2018-05-30T13:00:00Z
87 mydb           d1               2018-05-29T13:00:00Z 2018-05-29T14:00:00Z 2018-05-30T14:00:00Z
4  nodes          autogen          2018-03-12T00:00:00Z 2018-03-19T00:00:00Z 2018-03-19T00:00:00Z
5  HomeVisit      autogen          2018-03-12T00:00:00Z 2018-03-19T00:00:00Z 2018-03-19T00:00:00Z






> select * from autogen.cpu;
name: cpu
time                           host    region  value
----                           ----    ------  -----
2018-03-06T06:30:57.464227026Z serverA us_west 0.64
> select * from Cpu;
name: Cpu
time                           host     region  value
----                           ----     ------  -----
2018-05-29T08:36:13Z           serverX  us_west 0.64
2018-05-29T08:48:54Z           serverF  us_west 0.64
2018-05-29T08:48:54Z           serverF1 us_west 0.64
2018-05-29T09:06:58.532829069Z serverC  us_west 0.64
2018-05-29T09:09:36.977236835Z serverC  us_west 0.64
...每秒一个

11 实战经验

1、在influxdb中,tag_set + timestamp 用于标识是否同一条记录,如果有两条记录该值相同,后面的记录的field_set会覆盖前面的值。

2、取值范围很多的列不要存到tag。

配置文件中默认max-values-per-tag = 100000。虽然修改这个值可以解决 max-values-per-tag limit exceeded (100000/100000) 问题,但是建议不改,把这种列放到filed里
InfluxDB在内存中维护了系统中每个series数据的索引。随着具有唯一性的series数据数量的增长,RAM的使用也会增长。过高的series cardinality会导致操作系统kill掉InfluxDB进程,抛出OOM异常。

3、日志过多塞爆服务器

删除文件后,进程占用的空间也没有被释放

4、 每天存储量
目前可以抗住每天6G的存储。

上限应该是每天8T以上(参考 http://www.infoq.com/cn/articles/storage-in-sequential-databases

修改存储路径:https://*.com/questions/28350290/how-to-change-location-of-influxdb-storage-folder

参考

  1. 饿了么 Influxdb 实践之路
  2. https://toutiao.io/posts/viqzdg/preview
  3. http://hbasefly.com/2017/12/08/influxdb-1/
上一篇:InfluxDB 聚合函数实用案例


下一篇:prometheus学习系列二: Prometheus安装