聚合框架有助于根据搜索查询提供聚合数据。聚合查询是数据库中重要的功能特性,ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。它基于查询条件来对数据进行分桶、计算的方法。有点类似于 SQL 中的 group by 再加一些函数方法的操作。聚合可以嵌套,由此可以组成复杂的操作(Bucketing聚合可以包含sub-aggregation)。
聚合计算的值可以取字段的值,也可是脚本计算的结果。查询请求体中以aggregations节点的语法定义:
"aggregations" : { //也可简写为 aggs "<aggregation_name>" : { //聚合的名字 "<aggregation_type>" : { //聚合的类型 <aggregation_body> //聚合体:对哪些字段进行聚合 } [,"meta" : { [<meta_data_body>] } ]? //元 [,"aggregations" : { [<sub_aggregation>]+ } ]? //在聚合里面在定义子聚合 } [,"<aggregation_name_2>" : { ... } ]* //聚合的名字 }
1、数据准备
(1) 创建员工索引employee
PUT employee { "mappings": { "properties": { "id": { "type": "integer" }, "name": { "type": "keyword" }, "job": { "type": "keyword" }, "age": { "type": "integer" }, "gender": { "type": "keyword" } } }, "settings":{ "index":{ "number_of_shards":3, #分片数量 "number_of_replicas":2 #副本数量 } } }
(2) 插入数据
POST employee/_bulk {"index": {"_id": 1}} {"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"} {"index": {"_id": 2}} {"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"} {"index": {"_id": 3}} {"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"} {"index": {"_id": 4}} {"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"} {"index": {"_id": 5}} {"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"} {"index": {"_id": 6}} {"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"} {"index": {"_id": 7}} {"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"} {"index": {"_id": 8}} {"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"} {"index": {"_id": 9}} {"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"} {"index": {"_id": 10}} {"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"} {"index": {"_id": 11}} {"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"} {"index": {"_id": 12}} {"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"} {"index": {"_id": 13}} {"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"} {"index": {"_id": 14}} {"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"} {"index": {"_id": 15}} {"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"} {"index": {"_id": 16}} {"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"} #这里有换行符
数据说明:插入的数据为员工信息,name是员工的姓名,job是员工的工种,age为员工的年龄,sal为员工的薪水,gender为员工的性别。
指标聚合
指标聚合,它是对文档进行一些权值计算(比如求所有文档某个字段求最大、最小、和、平均值),输出结果往往是文档的权值,相当于为文档添加了一些统计信息。
它基于特定字段(field)或脚本值(generated using scripts),计算聚合中文档的数值权值。数值权值聚合(注意分类只针对数值权值聚合,非数值的无此分类)输出单个权值的,也叫 single-value numeric metrics,其它生成多个权值(比如:stats)的被叫做 multi-value numeric metrics。
max min sum avg
Max Aggregation,求最大值。基于文档的某个值(可以是特定的数值型字段,也可以通过脚本计算而来),计算该值在聚合文档中的均值。
Min Aggregation,求最小值。同上
Sum Aggregation,求和。同上
Avg Aggregation,求平均数。同上
桶聚合
矩阵聚合
管道聚合