Rollup—Elastic Stack 实战手册

Rollup—Elastic Stack 实战手册

· 更多精彩内容,请下载阅读全本《Elastic Stack实战手册》

· 加入创作人行列,一起交流碰撞,参与技术圈年度盛事吧

创作人:杨景江
审稿人:朱永生

汇总作业( rollup jobs )是周期性执行的任务,通过汇总作业,可以将某些索引中的数据进行周期性自定义化聚合,然后将聚合后的数据写入到新的索引中,整个流程叫做 Rollup 。

使用场景:

汇总历史数据:

由于历史数据数据量大,占用磁盘成本高,相关业务方只关心近期几天的原始数据,历史数据不关心原始数据,只关心固定指标统计。为了节省成本,就可以通过 Rollup 操作将历史数据进行汇总,写入到新的索引,之后将历史索引删除( ILM 功能),进而节省大量成本

转换最佳时间:

由于数据量或机器硬件等原因,导致实时聚合查询耗时较长,可以通过在夜间或者准实时进行 Rollup 操作,将前一天索引或者几分钟前的数据进行汇总,写入到新索引(将毫秒级别数据汇总,转换为秒级甚至分钟级别),用户查询 Rollup 后新索引的数据,进而提升查询效率。

汇总历史数据功能限制:

汇总功能只允许使用以下聚合方式对字段进行分组

  • Date Histogram aggregation
  • Histogram aggregation
  • Terms aggregation (使用较多)

数字字段只可以进行如下指标聚合

  • Min aggregation
  • Max aggregation
  • Sum aggregation
  • Average aggregation
  • Value Count aggregation

每个功能都要结合具体业务场景来使用,切忌为了使用功能而设计。

API 介绍

此处以 Elasticsearch 慢查原始数据统计功能为例进行介绍(敏感信息已经替换)

数据准备

索引 mapping 结构:

PUT es-slowlog-2021-04-21
{
    "mappings": {
      "_field_names": {
        "enabled": false
      },
      "dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "ignore_above": 512,
              "type": "keyword"
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "cluster": {
          "type": "keyword",
          "ignore_above": 512
        },
        "host": { 
          "properties": { 
            "name": { 
              "type": "keyword",
              "ignore_above": 512
            }
          }
        },
        "elasticsearch": {
          "properties": {
            "index": {
              "properties": {
                "name": {
                  "type": "keyword",
                  "ignore_above": 512
                }
              }
            }
          }
        },
        "timestamp_local": {
          "type": "date"
        }
      }
    }
}

单条数据 demo 样例(与上边的 mapping 对应):

POST es-slowlog-2021-04-21/_doc
{
  "cluster": "clustername-demo",
  "offset": 0,
  "log": {
    "level": "WARN"
  },
  "prospector": {
    "type": "log"
  },
  "source": "/home/elasticsearch/clustername-demo_index_search_slowlog.log",
  "message": "[2021-04-21T14:03:06,896][WARN ][i.s.s.query ] [host_name-demo] [basiclog-slowlog_2021-04-02][2] took[2.3s], took_millis[2307], total_hits[23129 hits], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[4], source[{\"size\":0,\"query\":{\"bool\":{\"filter\":[{\"match_all\":{\"boost\":1.0}},{\"match_phrase\":{\"logtype.keyword\":{\"query\":\"server\",\"slop\":0,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"@timestamp\":{\"from\":\"2021-04-02T15:48:04.138Z\",\"to\":\"2021-04-02T16:03:04.138Z\",\"include_lower\":true,\"include_upper\":true,\"format\":\"strict_date_optional_time\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[],\"excludes\":[]},\"stored_fields\":\"*\",\"docvalue_fields\":[{\"field\":\"@timestamp\",\"format\":\"date_time\"},{\"field\":\"time\",\"format\":\"date_time\"}],\"script_fields\":{},\"track_total_hits\":2147483647,\"aggregations\":{\"2\":{\"terms\":{\"field\":\"cluster.keyword\",\"size\":20,\"min_doc_count\":1,\"shard_min_doc_count\":0,\"show_term_doc_count_error\":false,\"order\":[{\"_count\":\"desc\"},{\"_key\":\"asc\"}]}}}}], id[],",
  "input": {
    "type": "log"
  },
  "logtype": "slowlog",
  "log_type": "basic-slowlog",
  "timestamp_local": "2021-04-21T14:03:06.896+08:00",
  "@timestamp": "2021-04-21T14:03:06.896Z",
  "elasticsearch": {
    "node": {
      "name": "host_name-demo"
    },
    "slowlog": {
      "took": "2.3s",
      "logger": "i.s.s.query "
    },
    "index": {
      "name": "basiclog-slowlog_2021-04-02"
    },
    "shard": {
      "id": "2"
    }
  },
  "host": {
    "name": "host_name-demo"
  },
  "beat": {
    "hostname": "beathostname-demo",
    "name": "beathostname-demo",
    "version": "6.5.4"
  },
  "@version": "1",
  "event": {
    "duration": 2307000000,
    "created": "2021-04-21T06:59:11.934Z",
    "kind": "event",
    "category": "database",
    "type": "info"
  }
}

在 Kibana 中配置 Index Patterns

Rollup—Elastic Stack 实战手册


注:最新版本 API 请参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/master/xpack-rollup.html

基础 API

创建汇总任务:

请求:PUT _rollup/job/<job_id>

参数 必选 类型 说明
index_pattern string 索引pattern名称
rollup_index string 目标索引,部分版本限制索引名以rollup开头
cron string 定时任务执行周期,与汇总数据的时间间隔无关。
page_size integer 汇总索引每次迭代中处理的存储桶的结果数。值越大,执行越快,但是处理过程中需要更多的内存。
groups object 为汇总作业定义日期直方图聚合
-date_histogram object 定义 日期直方图聚合
--calendar_interval object 时间桶大小,1m 代表一分钟一个桶
--field string 聚合依据的时间字段
--time_zone string 时区,default:UTC
--delay time units 汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询
-terms object 分组的字段属性
--fields string 定义terms字段集。此数组字段可以是keyword也可以是numerics类型,无顺序要求。
-histogram object 直方图组将一个或多个数字字段聚合为数字直方图间隔
--fields array 构建直方图的字段,必须是数字
--interval integer 汇总时要生成的直方图存储桶的间隔
metrics object 定义汇总数据的方式
-field string 定义需要采集的指标的字段。例如以上示例是分别对,进行采集。
-metrics array 定义聚合算子。设置为sum,表示对某个指标进行sum运算。仅支持min、max、sum、avg、value_count。
timeout string 请求超时时间
PUT _rollup/job/es-slowlog-agg-id
{
  "index_pattern": "es-slowlog*", //索引pattern名称
  "rollup_index": "rollup-es-slowlog-agg", //目标索引,rollup-开头必须明确指定
  "cron": "0 * * * * ?",  //定时任务执行周期,与汇总数据的时间间隔无关。
  "groups": {
    "date_histogram": { //定义 日期直方图聚合
      "calendar_interval": "1m",  // 时间桶大小,一分钟一个桶
      "field": "timestamp_local", //聚合的时间字段
      "delay": "1m", //汇总延时,多久之前的数据可以进行汇总,因为部分数据写入可能会有延时,汇总任务前要将数据全部写入并且可查询
      "time_zone": "UTC" // 时区 eg: GMT+8
    },
    "terms": {
      "fields": [  //汇总字段
        "cluster", // 集群的名称
        "elasticsearch.index.name", //索引名称
        "host.name" //主机名
      ]
    }
  },
  "metrics": [], //默认是count数,可以指定min、max、sum、average、value count
  "timeout": "20s", // 超时时间
  "page_size": 10000 // 单页数量,较大的值会更快地汇总,但也会耗费更多内存
}

查询所有汇总任务:

GET _rollup/job/*

获取单个汇总任务详情:

请求:GET _rollup/job/<job_id>

GET _rollup/job/es-slowlog-agg-id
{
  "jobs": [
    {
      "config": {
        "id": "es-slowlog-agg-id",
        "index_pattern": "es-slowlog*",
        "rollup_index": "rollup-es-slowlog-agg",
        "cron": "0 * * * * ?",
        "groups": {
          "date_histogram": {
            "calendar_interval": "1m",
            "field": "timestamp_local",
            "delay": "1m",
            "time_zone": "UTC"
          },
          "terms": {
            "fields": [
              "cluster",
              "elasticsearch.index.name",
              "host.name"
            ]
          }
        },
        "metrics": [

        ],
        "timeout": "20s",
        "page_size": 10000
      },
      "status": {
        "job_state": "stopped",
        "upgraded_doc_id": true
      },
      "stats": {
        "pages_processed": 0,
        "documents_processed": 0,
        "rollups_indexed": 0,
        "trigger_count": 0,
        "index_time_in_ms": 0,
        "index_total": 0,
        "index_failures": 0,
        "search_time_in_ms": 0,
        "search_total": 0,
        "search_failures": 0,
        "processing_time_in_ms": 0,
        "processing_total": 0
      }
    }
  ]
}

开始汇总任务:

请求:POST _rollup/job/<job_id>/_start

POST _rollup/job/es-slowlog-agg-id/_start
//执行后获取当前任务状态,关注下status、stat,status中
GET _rollup/job/es-slowlog-agg-id
{
  "jobs": [
    {
      "config": {
        "id": "es-slowlog-agg-id",
        "index_pattern": "es-slowlog*",
        "rollup_index": "rollup-es-slowlog-agg",
        "cron": "0 * * * * ?",
        "groups": {
          "date_histogram": {
            "calendar_interval": "1m",
            "field": "timestamp_local",
            "delay": "1m",
            "time_zone": "UTC"
          },
          "terms": {
            "fields": [
              "cluster",
              "elasticsearch.index.name",
              "host.name"
            ]
          }
        },
        "metrics": [

        ],
        "timeout": "20s",
        "page_size": 10000
      },
      "status": {
        "job_state": "started",  //如果停止的任务,此处显示stopped 
        "current_position": { //当前rollup任务执行的位置,及term结果
          "cluster.terms": "clustername-demo",
          "elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
          "host.name.terms": "host_name-demo",
          "timestamp_local.date_histogram": 1618984980000
        },
        "upgraded_doc_id": true
      },
      "stats": {//执行状态
        "pages_processed": 2,
        "documents_processed": 1,
        "rollups_indexed": 1,
        "trigger_count": 1,
        "index_time_in_ms": 103,
        "index_total": 1,
        "index_failures": 0,
        "search_time_in_ms": 6,
        "search_total": 2,
        "search_failures": 0,
        "processing_time_in_ms": 0,
        "processing_total": 2
      }
    }
  ]
}

status.job_state 描述:

stopped

表示任务已暂停。

started

表示任务正在运行,但没有主动汇总数据。当 cron 间隔触发时,作业的任务将开始处理数据。

indexing

意味着正在处理数据并创建新的汇总文档。在此状态下,任何后续的 cron 间隔触发器都将被忽略,因为该作业已经与先前的触发器一起处于活动状态。

abort

是一种瞬态,通常用户不会看到。如果由于某种原因需要关闭任务(已删除作业,遇到不可恢复的错误等)。abort 状态后不久,作业将自己从群集中删除。

停止汇总任务:

请求:POST _rollup/job/<job_id>/_stop

POST _rollup/job/es-slowlog-agg-id/_stop

删除汇总任务:

请求:DELETE _rollup/job/<job_id>

删除操作需谨慎
DELETE /_rollup/job/es-slowlog-agg-id

_rollup_search 查询

因为在原始文档和汇总文档中使用的文档结构不同。 Rollup 搜索会将标准查询 DSL 重写为与汇总文档相同的结构,然后获取响应并将其重写回客户端。

使用方式:

GET **<target>**/_rollup_search

<target>参数规则(必需,字符串):

  • 必须指定索引或通配符表达式。
  • 可以指定多个非汇总索引。
  • 只能指定一个汇总索引。如果提供多个,则会发生异常。
  • 可以使用通配符表达式,但是,如果它们匹配多个汇总索引,则会发生异常。

eg: es-slowlog*,rollup-es-slowlog-agg1/_rollup_search。

请求体支持常规 Search API 的功能的子集。它支持:

  • query用于指定 DSL 查询的参数,但受一些限制
请参阅

汇总搜索限制https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-search-limitations.html

汇总聚合限制https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-agg-limitations.html

  • aggregations 用于指定聚合的参数

不可用的功能:

  • size:无法获取原始数据,如果想获取原始数据,请使用 _search 查询汇总索引。
  • highlightersuggestorspost_filterprofileexplain:不允许使用。

原始数据和汇总索引同时查询实现原理:

Elasticsearch 接收到原始数据和汇总数据联合 _rollup_search 查询响应后, 会重写汇总响应,并将两者合并在一起。在合并过程中,如果两个响应之间的存储桶中有任何重叠,则使用非汇总索引中汇总的桶数据。

样例:

创建新的复杂任务,具体任务信息如下

//创建复杂任务,汇总多个指标,任务详情如下
{
  "config": {
    "id": "es-slowlog-agg-id1",
    "index_pattern": "es-slowlog*",
    "rollup_index": "rollup-es-slowlog-agg1",
    "cron": "0 * * * * ?",
    "groups": {
      "date_histogram": {
        "calendar_interval": "1m",
        "field": "timestamp_local",
        "delay": "1m",
        "time_zone": "UTC"
      },
      "histogram": {
        "interval": 8,
        "fields": [
          "event.duration"
        ]
      },
      "terms": {
        "fields": [
          "cluster",
          "elasticsearch.index.name",
          "host.name"
        ]
      }
    },
    "metrics": [
      {
        "field": "event.duration",
        "metrics": [
          "avg",
          "max",
          "min",
          "sum",
          "value_count"
        ]
      }
    ],
    "timeout": "20s",
    "page_size": 10000
  },
  "status": {
    "job_state": "started",
    "current_position": {
      "cluster.terms": "clustername-demo",
      "elasticsearch.index.name.terms": "basiclog-slowlog_2021-04-02",
      "event.duration.histogram": 2307000000,
      "host.name.terms": "host_name-demo",
      "timestamp_local.date_histogram": 1618984980000
    },
    "upgraded_doc_id": true
  },
  "stats": {
    "pages_processed": 6,
    "documents_processed": 1,
    "rollups_indexed": 1,
    "trigger_count": 5,
    "index_time_in_ms": 115,
    "index_total": 1,
    "index_failures": 0,
    "search_time_in_ms": 21,
    "search_total": 6,
    "search_failures": 0,
    "processing_time_in_ms": 0,
    "processing_total": 6
  }
}

_search 查询汇总目标索引中的原始数据:

GET rollup-es-slowlog-agg1/_search
{
  "size":10,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

返回结果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
       {
        "_index": "rollup-es-slowlog-agg1",
        "_type": "_doc",
        "_id": "es-slowlog-agg-id1$5uzfGmyS2uAb3XRznkZBgA",
        "_score": 1,
        "_source": {
          "cluster.terms.value": "bj-ali-xueyan-oa-es-cluster",
          "event.duration.avg._count": 1,
          "event.duration.max.value": 2377000000,
          "event.duration.histogram.value": 2377000000,
          "timestamp_local.date_histogram.time_zone": "UTC",
          "elasticsearch.index.name.terms.value": "basiclog-slowlog_2400-2021-04-02",
          "host.name.terms._count": 1,
          "cluster.terms._count": 1,
          "host.name.terms.value": "bj-sjhl-university-es-online-99-62",
          "event.duration.avg.value": 2377000000,
          "elasticsearch.index.name.terms._count": 1,
          "event.duration.histogram.interval": 8,
          "timestamp_local.date_histogram._count": 1,
          "timestamp_local.date_histogram.timestamp": 1618995780000,
          "_rollup.version": 2,
          "event.duration.histogram._count": 1,
          "timestamp_local.date_histogram.interval": "1m",
          "event.duration.sum.value": 2377000000,
          "event.duration.min.value": 2377000000,
          "event.duration.value_count.value": 1,
          "_rollup.id": "es-slowlog-agg-id1"
        }
      }
    ]
  }
}


_rollup_search 查询数据(可以把原始数据和汇总数据联合查询)

GET es-slowlog*,rollup-es-slowlog-agg1/_rollup_search 
{
  "size": 0,
  "aggregations": {
    "avg_event.duration": {
      "avg": {
        "field": "event.duration"
      }
    }
  }
}


//返回值
{
  "took": 740,
  "timed_out": false,
  "terminated_early": false,
  "num_reduce_phases": 2,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [

    ]
  },
  "aggregations": {
    "avg_event.duration": {
      "value": 2311777445.714286
    }
  }
}

获取汇总信息

根据 Rollup 配置中的 index_pattern 获取对应的任务,支持 _all 查询所有

请求:GET _rollup/data/

//查询所有
GET _rollup/data/_all
//查询指定目标
GET _rollup/data/es-slowlog*
{
  "es-slowlog*": {
    "rollup_jobs": [
      {
        "job_id": "es-slowlog-agg-id",
        "rollup_index": "rollup-es-slowlog-agg",
        "index_pattern": "es-slowlog*",
        "fields": {
          "cluster": [
            {
              "agg": "terms"
            }
          ],
          "timestamp_local": [
            {
              "agg": "date_histogram",
              "delay": "1m",
              "time_zone": "UTC",
              "calendar_interval": "1m"
            }
          ],
          "elasticsearch.index.name": [
            {
              "agg": "terms"
            }
          ],
          "host.name": [
            {
              "agg": "terms"
            }
          ]
        }
      },
      {
        "job_id": "es-slowlog-agg-id1",
        "rollup_index": "rollup-es-slowlog-agg",
        "index_pattern": "es-slowlog*",
        "fields": {
          "cluster": [
            {
              "agg": "terms"
            }
          ],
          "timestamp_local": [
            {
              "agg": "date_histogram",
              "delay": "1m",
              "time_zone": "UTC",
              "calendar_interval": "1m"
            }
          ],
          "elasticsearch.index.name": [
            {
              "agg": "terms"
            }
          ],
          "host.name": [
            {
              "agg": "terms"
            }
          ]
        }
      },
      {
        "job_id": "es-slowlog-agg-id1",
        "rollup_index": "rollup-es-slowlog-agg1",
        "index_pattern": "es-slowlog*",
        "fields": {
          "event.duration": [
            {
              "agg": "histogram",
              "interval": 8
            },
            {
              "agg": "avg"
            },
            {
              "agg": "max"
            },
            {
              "agg": "min"
            },
            {
              "agg": "sum"
            },
            {
              "agg": "value_count"
            }
          ],
          "cluster": [
            {
              "agg": "terms"
            }
          ],
          "timestamp_local": [
            {
              "agg": "date_histogram",
              "delay": "1m",
              "time_zone": "UTC",
              "calendar_interval": "1m"
            }
          ],
          "elasticsearch.index.name": [
            {
              "agg": "terms"
            }
          ],
          "host.name": [
            {
              "agg": "terms"
            }
          ]
        }
      },
      {
        "job_id": "es-slowlog-agg-id3",
        "rollup_index": "rollupes-slowlog-agg",
        "index_pattern": "es-slowlog*",
        "fields": {
          "cluster": [
            {
              "agg": "terms"
            }
          ],
          "timestamp_local": [
            {
              "agg": "date_histogram",
              "delay": "1m",
              "time_zone": "UTC",
              "calendar_interval": "1m"
            }
          ],
          "elasticsearch.index.name": [
            {
              "agg": "terms"
            }
          ],
          "host.name": [
            {
              "agg": "terms"
            }
          ]
        }
      }
    ]
  }
}

根据 Rollup 目标索引查询对应的任务,支持 * 匹配

请求:GET /_rollup/data

GET rollupes-slowlog-*/_rollup/data
GET rollupes-slowlog-agg/_rollup/data
{
  "rollupes-slowlog-agg": {
    "rollup_jobs": [
      {
        "job_id": "es-slowlog-agg-id3",
        "rollup_index": "rollupes-slowlog-agg",
        "index_pattern": "es-slowlog*",
        "fields": {
          "cluster": [
            {
              "agg": "terms"
            }
          ],
          "timestamp_local": [
            {
              "agg": "date_histogram",
              "delay": "1m",
              "time_zone": "UTC",
              "calendar_interval": "1m"
            }
          ],
          "elasticsearch.index.name": [
            {
              "agg": "terms"
            }
          ],
          "host.name": [
            {
              "agg": "terms"
            }
          ]
        }
      }
    ]
  }
}

Kibana 使用介绍

对 API 有了一定了解之后,再来通过 Kibana 创建对应 Elasticsearch 集群的慢查统计就比较简单了

Kibana 使用中文的部分功能有 bug(例如 Rollup 选择指标时,会出现异常的情况),建议 Kibana 语言选择英文

填写 Logistics

Rollup—Elastic Stack 实战手册

选择 Date histogram(必填)

Rollup—Elastic Stack 实战手册

选择 Terms ,此处选择集群名称、索引名称、节点名称(选填)

Rollup—Elastic Stack 实战手册

Rollup—Elastic Stack 实战手册

根据需求选择 Histogram(选填),本次样例中的 Elasticsearch 慢查 Rollup 只需要统计 Count 数,此处不需要选择,直接下一步

Rollup—Elastic Stack 实战手册

根据需求填写 Metrics(选填),本次样例中的 Elasticsearch 慢查 Rollup 只需要统计 Count 数,此处不需要选择,直接下一步

Rollup—Elastic Stack 实战手册

操作完成,保存

Rollup—Elastic Stack 实战手册

查看状态

Rollup—Elastic Stack 实战手册

配置 Index Pattern 注意选择的是 Rollup index pattern,图表配置和普通没有区别

Rollup—Elastic Stack 实战手册

创作人简介:
杨景江,关注研究中间件,比如 ES,Redis,RocketMQ 等技术领域。
博客:https://blog.csdn.net/xiaoyanghapi/article/month/2016/08
上一篇:Kibana 的 Alert—Elastic Stack 实战手册


下一篇:可观测性-Elastic Stack 实战手册