· 更多精彩内容,请下载阅读全本《Elastic Stack实战手册》
创作人:杨景江
审稿人:朱永生
汇总作业( rollup jobs )是周期性执行的任务,通过汇总作业,可以将某些索引中的数据进行周期性自定义化聚合,然后将聚合后的数据写入到新的索引中,整个流程叫做 Rollup 。
使用场景:
汇总历史数据:
由于历史数据数据量大,占用磁盘成本高,相关业务方只关心近期几天的原始数据,历史数据不关心原始数据,只关心固定指标统计。为了节省成本,就可以通过 Rollup 操作将历史数据进行汇总,写入到新的索引,之后将历史索引删除( ILM 功能),进而节省大量成本
转换最佳时间:
由于数据量或机器硬件等原因,导致实时聚合查询耗时较长,可以通过在夜间或者准实时进行 Rollup 操作,将前一天索引或者几分钟前的数据进行汇总,写入到新索引(将毫秒级别数据汇总,转换为秒级甚至分钟级别),用户查询 Rollup 后新索引的数据,进而提升查询效率。
汇总历史数据功能限制:
汇总功能只允许使用以下聚合方式对字段进行分组
- Date Histogram aggregation
- Histogram aggregation
- Terms aggregation (使用较多)
数字字段只可以进行如下指标聚合
- Min aggregation
- Max aggregation
- Sum aggregation
- Average aggregation
- Value Count aggregation
每个功能都要结合具体业务场景来使用,切忌为了使用功能而设计。
API 介绍
此处以 Elasticsearch 慢查原始数据统计功能为例进行介绍(敏感信息已经替换)
数据准备
索引 mapping 结构:
PUT es-slowlog-2021-04-21
{
"mappings": {
"_field_names": {
"enabled": false
},
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"ignore_above": 512,
"type": "keyword"
}
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"cluster": {
"type": "keyword",
"ignore_above": 512
},
"host": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 512
}
}
},
"elasticsearch": {
"properties": {
"index": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 512
}
}
}
}
},
"timestamp_local": {
"type": "date"
}
}
}
}
单条数据 demo 样例(与上边的 mapping 对应):
POST es-slowlog-2021-04-21/_doc
{
"cluster": "clustername-demo",
"offset": 0,
"log": {
"level": "WARN"
},
"prospector": {
"type": "log"
},
"source": "/home/elasticsearch/clustername-demo_index_search_slowlog.log",
"message": "[2021-04-21T14:03:06,896][WARN ][i.s.s.query ] [host_name-demo] [basiclog-slowlog_2021-04-02][2] took[2.3s], took_millis[2307], total_hits[23129 hits], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[4], source[{\"size\":0,\"query\":{\"bool\":{\"filter\":[{\"match_all\":{\"boost\":1.0}},{\"match_phrase\":{\"logtype.keyword\":{\"query\":\"server\",\"slop\":0,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"@timestamp\":{\"from\":\"2021-04-02T15:48:04.138Z\",\"to\":\"2021-04-02T16:03:04.138Z\",\"include_lower\":true,\"include_upper\":true,\"format\":\"strict_date_optional_time\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[],\"excludes\":[]},\"stored_fields\":\"*\",\"docvalue_fields\":[{\"field\":\"@timestamp\",\"format\":\"date_time\"},{\"field\":\"time\",\"format\":\"date_time\"}],\"script_fields\":{},\"track_total_hits\":2147483647,\"aggregations\":{\"2\":{\"terms\":{\"field\":\"cluster.keyword\",\"size\":20,\"min_doc_count\":1,\"shard_min_doc_count\":0,\"show_term_doc_count_error\":false,\"order\":[{\"_count\":\"desc\"},{\"_key\":\"asc\"}]}}}}], id[],",
"input": {
"type": "log"
},
"logtype": "slowlog",
"log_type": "basic-slowlog",
"timestamp_local": "2021-04-21T14:03:06.896+08:00",
"@timestamp": "2021-04-21T14:03:06.896Z",
"elasticsearch": {
"node": {
"name": "host_name-demo"
},
"slowlog": {
"took": "2.3s",
"logger": "i.s.s.query "
},
"index": {
"name": "basiclog-slowlog_2021-04-02"
},
"shard": {
"id": "2"
}
},
"host": {
"name": "host_name-demo"
},
"beat": {
"hostname": "beathostname-demo",
"name": "beathostname-demo",
"version": "6.5.4"
},
"@version": "1",
"event": {
"duration": 2307000000,
"created": "2021-04-21T06:59:11.934Z",
"kind": "event",
"category": "database",
"type": "info"
}
}
在 Kibana 中配置 Index Patterns
注:最新版本 API 请参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/master/xpack-rollup.html
基础 API
创建汇总任务:
请求:PUT _rollup/job/<job_id>
参数 | 必选 | 类型 | 说明 |
---|---|---|---|
index_pattern | 是 | string | 索引pattern名称 |
rollup_index | 是 | string | 目标索引,部分版本限制索引名以rollup开头 |
cron | 是 | string | 定时任务执行周期,与汇总数据的时间间隔无关。 |
page_size | 是 | integer | 汇总索引每次迭代中处理的存储桶的结果数。值越大,执行越快,但是处理过程中需要更多的内存。 |
groups | 是 | object | 为汇总作业定义日期直方图聚合 |
-date_histogram | 是 | object | 定义 日期直方图聚合 |
--calendar_interval | 是 | object | 时间桶大小,1m 代表一分钟一个桶 |
--field | 是 | string | 聚合依据的时间字段 |
--time_zone | 否 | string | 时区,default:UTC |
--delay | 否 | time units | 汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询 |
-terms | 否 | object | 分组的字段属性 |
--fields | 是 | string | 定义terms字段集。此数组字段可以是keyword也可以是numerics类型,无顺序要求。 |
-histogram | 否 | object | 直方图组将一个或多个数字字段聚合为数字直方图间隔 |
--fields | 是 | array | 构建直方图的字段,必须是数字 |
--interval | 是 | integer | 汇总时要生成的直方图存储桶的间隔 |
metrics | 否 | object | 定义汇总数据的方式 |
-field | 是 | string | 定义需要采集的指标的字段。例如以上示例是分别对,进行采集。 |
-metrics | 是 | array | 定义聚合算子。设置为sum,表示对某个指标进行sum运算。仅支持min、max、sum、avg、value_count。 |
timeout | 否 | string | 请求超时时间 |
PUT _rollup/job/es-slowlog-agg-id
{
"index_pattern": "es-slowlog*", //索引pattern名称
"rollup_index": "rollup-es-slowlog-agg", //目标索引,rollup-开头必须明确指定
"cron": "0 * * * * ?", //定时任务执行周期,与汇总数据的时间间隔无关。
"groups": {
"date_histogram": { //定义 日期直方图聚合
"calendar_interval": "1m", // 时间桶大小,一分钟一个桶
"field": "timestamp_local", //聚合的时间字段
"delay": "1m", //汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询
"time_zone": "UTC" // 时区 eg: GMT+8
},
"terms": {
"fields": [ //汇总字段
"cluster", // 集群的名称
"elasticsearch.index.name", //索引名称
"host.name" //主机名
]
}
},
"metrics": [], //默认是count数,可以指定min、max、sum、average、value count
"timeout": "20s", // 超时时间
"page_size": 10000 // 单页数量,较大的值会更快地汇总,但也会耗费更多内存
}
查询所有汇总任务:
GET _rollup/job/*
获取单个汇总任务详情:
请求:GET _rollup/job/<job_id>
GET _rollup/job/es-slowlog-agg-id
{
"jobs": [
{
"config": {
"id": "es-slowlog-agg-id",
"index_pattern": "es-slowlog*",
"rollup_index": "rollup-es-slowlog-agg",
"cron": "0 * * * * ?",
"groups": {
"date_histogram": {
"calendar_interval": "1m",
"field": "timestamp_local",
"delay": "1m",
"time_zone": "UTC"
},
"terms": {
"fields": [
"cluster",
"elasticsearch.index.name",
"host.name"
]
}
},
"metrics": [
],
"timeout": "20s",
"page_size": 10000
},
"status": {
"job_state": "stopped",
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 0,
"documents_processed": 0,
"rollups_indexed": 0,
"trigger_count": 0,
"index_time_in_ms": 0,
"index_total": 0,
"index_failures": 0,
"search_time_in_ms": 0,
"search_total": 0,
"search_failures": 0,
"processing_time_in_ms": 0,
"processing_total": 0
}
}
]
}
开始汇总任务:
请求:POST _rollup/job/<job_id>/_start
POST _rollup/job/es-slowlog-agg-id/_start
//执行后获取当前任务状态,关注下status、stat,status中
GET _rollup/job/es-slowlog-agg-id
{
"jobs": [
{
"config": {
"id": "es-slowlog-agg-id",
"index_pattern": "es-slowlog*",
"rollup_index": "rollup-es-slowlog-agg",
"cron": "0 * * * * ?",
"groups": {
"date_histogram": {
"calendar_interval": "1m",
"field": "timestamp_local",
"delay": "1m",
"time_zone": "UTC"
},
"terms": {
"fields": [
"cluster",
"elasticsearch.index.name",
"host.name"
]
}
},
"metrics": [
],
"timeout": "20s",
"page_size": 10000
},
"status": {
"job_state": "started", //如果停止的任务,此处显示stopped
"current_position": { //当前rollup任务执行的位置,及term结果
"cluster.terms": "clustername-demo",
"elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
"host.name.terms": "host_name-demo",
"timestamp_local.date_histogram": 1618984980000
},
"upgraded_doc_id": true
},
"stats": {//执行状态
"pages_processed": 2,
"documents_processed": 1,
"rollups_indexed": 1,
"trigger_count": 1,
"index_time_in_ms": 103,
"index_total": 1,
"index_failures": 0,
"search_time_in_ms": 6,
"search_total": 2,
"search_failures": 0,
"processing_time_in_ms": 0,
"processing_total": 2
}
}
]
}
status.job_state 描述:
stopped
表示任务已暂停。
started
表示任务正在运行,但没有主动汇总数据。当 cron 间隔触发时,作业的任务将开始处理数据。
indexing
意味着正在处理数据并创建新的汇总文档。在此状态下,任何后续的 cron 间隔触发器都将被忽略,因为该作业已经与先前的触发器一起处于活动状态。
abort
是一种瞬态,通常用户不会看到。如果由于某种原因需要关闭任务(已删除作业,遇到不可恢复的错误等)。abort 状态后不久,作业将自己从群集中删除。
停止汇总任务:
请求:POST _rollup/job/<job_id>/_stop
POST _rollup/job/es-slowlog-agg-id/_stop
删除汇总任务:
请求:DELETE _rollup/job/<job_id>
删除操作需谨慎
DELETE /_rollup/job/es-slowlog-agg-id
_rollup_search 查询
因为在原始文档和汇总文档中使用的文档结构不同。 Rollup 搜索会将标准查询 DSL 重写为与汇总文档相同的结构,然后获取响应并将其重写回客户端。
使用方式:
GET **<target>**
/_rollup_search
<target>
参数规则(必需,字符串):
- 必须指定索引或通配符表达式。
- 可以指定多个非汇总索引。
- 只能指定一个汇总索引。如果提供多个,则会发生异常。
- 可以使用通配符表达式,但是,如果它们匹配多个汇总索引,则会发生异常。
eg: es-slowlog*,rollup-es-slowlog-agg1/_rollup_search。
请求体支持常规 Search API 的功能的子集。它支持:
-
query
用于指定 DSL 查询的参数,但受一些限制
请参阅汇总搜索限制:https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-search-limitations.html
汇总聚合限制:https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-agg-limitations.html
-
aggregations
用于指定聚合的参数
不可用的功能:
-
size
:无法获取原始数据,如果想获取原始数据,请使用 _search 查询汇总索引。 -
highlighter
,suggestors
,post_filter
,profile
,explain
:不允许使用。
原始数据和汇总索引同时查询实现原理:
Elasticsearch 接收到原始数据和汇总数据联合 _rollup_search 查询响应后, 会重写汇总响应,并将两者合并在一起。在合并过程中,如果两个响应之间的存储桶中有任何重叠,则使用非汇总索引中汇总的桶数据。
样例:
创建新的复杂任务,具体任务信息如下
//创建复杂任务,汇总多个指标,任务详情如下
{
"config": {
"id": "es-slowlog-agg-id1",
"index_pattern": "es-slowlog*",
"rollup_index": "rollup-es-slowlog-agg1",
"cron": "0 * * * * ?",
"groups": {
"date_histogram": {
"calendar_interval": "1m",
"field": "timestamp_local",
"delay": "1m",
"time_zone": "UTC"
},
"histogram": {
"interval": 8,
"fields": [
"event.duration"
]
},
"terms": {
"fields": [
"cluster",
"elasticsearch.index.name",
"host.name"
]
}
},
"metrics": [
{
"field": "event.duration",
"metrics": [
"avg",
"max",
"min",
"sum",
"value_count"
]
}
],
"timeout": "20s",
"page_size": 10000
},
"status": {
"job_state": "started",
"current_position": {
"cluster.terms": "clustername-demo",
"elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
"event.duration.histogram": 2307000000,
"host.name.terms": "host_name-demo",
"timestamp_local.date_histogram": 1618984980000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 6,
"documents_processed": 1,
"rollups_indexed": 1,
"trigger_count": 5,
"index_time_in_ms": 115,
"index_total": 1,
"index_failures": 0,
"search_time_in_ms": 21,
"search_total": 6,
"search_failures": 0,
"processing_time_in_ms": 0,
"processing_total": 6
}
}
_search 查询汇总目标索引中的原始数据:
GET rollup-es-slowlog-agg1/_search
{
"size":10,
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
}
],
"should": [],
"must_not": []
}
}
}
返回结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "rollup-es-slowlog-agg1",
"_type": "_doc",
"_id": "es-slowlog-agg-id1$5uzfGmyS2uAb3XRznkZBgA",
"_score": 1,
"_source": {
"cluster.terms.value": "bj-ali-xueyan-oa-es-cluster",
"event.duration.avg._count": 1,
"event.duration.max.value": 2377000000,
"event.duration.histogram.value": 2377000000,
"timestamp_local.date_histogram.time_zone": "UTC",
"elasticsearch.index.name.terms.value": "basiclog-slowlog_2400-2021-04-02",
"host.name.terms._count": 1,
"cluster.terms._count": 1,
"host.name.terms.value": "bj-sjhl-university-es-online-99-62",
"event.duration.avg.value": 2377000000,
"elasticsearch.index.name.terms._count": 1,
"event.duration.histogram.interval": 8,
"timestamp_local.date_histogram._count": 1,
"timestamp_local.date_histogram.timestamp": 1618995780000,
"_rollup.version": 2,
"event.duration.histogram._count": 1,
"timestamp_local.date_histogram.interval": "1m",
"event.duration.sum.value": 2377000000,
"event.duration.min.value": 2377000000,
"event.duration.value_count.value": 1,
"_rollup.id": "es-slowlog-agg-id1"
}
}
]
}
}
_rollup_search 查询数据(可以把原始数据和汇总数据联合查询)
GET es-slowlog*,rollup-es-slowlog-agg1/_rollup_search
{
"size": 0,
"aggregations": {
"avg_event.duration": {
"avg": {
"field": "event.duration"
}
}
}
}
//返回值
{
"took": 740,
"timed_out": false,
"terminated_early": false,
"num_reduce_phases": 2,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": 0,
"hits": [
]
},
"aggregations": {
"avg_event.duration": {
"value": 2311777445.714286
}
}
}
获取汇总信息
根据 Rollup 配置中的 index_pattern 获取对应的任务,支持 _all 查询所有
请求:GET _rollup/data/
//查询所有
GET _rollup/data/_all
//查询指定目标
GET _rollup/data/es-slowlog*
{
"es-slowlog*": {
"rollup_jobs": [
{
"job_id": "es-slowlog-agg-id",
"rollup_index": "rollup-es-slowlog-agg",
"index_pattern": "es-slowlog*",
"fields": {
"cluster": [
{
"agg": "terms"
}
],
"timestamp_local": [
{
"agg": "date_histogram",
"delay": "1m",
"time_zone": "UTC",
"calendar_interval": "1m"
}
],
"elasticsearch.index.name": [
{
"agg": "terms"
}
],
"host.name": [
{
"agg": "terms"
}
]
}
},
{
"job_id": "es-slowlog-agg-id1",
"rollup_index": "rollup-es-slowlog-agg",
"index_pattern": "es-slowlog*",
"fields": {
"cluster": [
{
"agg": "terms"
}
],
"timestamp_local": [
{
"agg": "date_histogram",
"delay": "1m",
"time_zone": "UTC",
"calendar_interval": "1m"
}
],
"elasticsearch.index.name": [
{
"agg": "terms"
}
],
"host.name": [
{
"agg": "terms"
}
]
}
},
{
"job_id": "es-slowlog-agg-id1",
"rollup_index": "rollup-es-slowlog-agg1",
"index_pattern": "es-slowlog*",
"fields": {
"event.duration": [
{
"agg": "histogram",
"interval": 8
},
{
"agg": "avg"
},
{
"agg": "max"
},
{
"agg": "min"
},
{
"agg": "sum"
},
{
"agg": "value_count"
}
],
"cluster": [
{
"agg": "terms"
}
],
"timestamp_local": [
{
"agg": "date_histogram",
"delay": "1m",
"time_zone": "UTC",
"calendar_interval": "1m"
}
],
"elasticsearch.index.name": [
{
"agg": "terms"
}
],
"host.name": [
{
"agg": "terms"
}
]
}
},
{
"job_id": "es-slowlog-agg-id3",
"rollup_index": "rollupes-slowlog-agg",
"index_pattern": "es-slowlog*",
"fields": {
"cluster": [
{
"agg": "terms"
}
],
"timestamp_local": [
{
"agg": "date_histogram",
"delay": "1m",
"time_zone": "UTC",
"calendar_interval": "1m"
}
],
"elasticsearch.index.name": [
{
"agg": "terms"
}
],
"host.name": [
{
"agg": "terms"
}
]
}
}
]
}
}
根据 Rollup 目标索引查询对应的任务,支持 * 匹配
请求:GET /_rollup/data
GET rollupes-slowlog-*/_rollup/data
GET rollupes-slowlog-agg/_rollup/data
{
"rollupes-slowlog-agg": {
"rollup_jobs": [
{
"job_id": "es-slowlog-agg-id3",
"rollup_index": "rollupes-slowlog-agg",
"index_pattern": "es-slowlog*",
"fields": {
"cluster": [
{
"agg": "terms"
}
],
"timestamp_local": [
{
"agg": "date_histogram",
"delay": "1m",
"time_zone": "UTC",
"calendar_interval": "1m"
}
],
"elasticsearch.index.name": [
{
"agg": "terms"
}
],
"host.name": [
{
"agg": "terms"
}
]
}
}
]
}
}
Kibana 使用介绍
对 API 有了一定了解之后,再来通过 Kibana 创建对应 Elasticsearch 集群的慢查统计就比较简单了
Kibana 使用中文的部分功能有 bug(例如 Rollup 选择指标时,会出现异常的情况),建议 Kibana 语言选择英文
填写 Logistics
选择 Date histogram(必填)
选择 Terms ,此处选择集群名称、索引名称、节点名称(选填)
根据需求选择 Histogram(选填),本次样例中的 Elasticsearch 慢查 Rollup 只需要统计 Count 数,此处不需要选择,直接下一步
根据需求填写 Metrics(选填),本次样例中的 Elasticsearch 慢查 Rollup 只需要统计 Count 数,此处不需要选择,直接下一步
操作完成,保存
查看状态
配置 Index Pattern 注意选择的是 Rollup index pattern,图表配置和普通没有区别
创作人简介:
杨景江,关注研究中间件,比如 ES,Redis,RocketMQ 等技术领域。
博客:https://blog.csdn.net/xiaoyanghapi/article/month/2016/08