Elasticsearch_数据的聚合查询

聚合框架有助于根据搜索查询提供聚合数据。聚合查询是数据库中重要的功能特性,ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。它基于查询条件来对数据进行分桶、计算的方法。有点类似于 SQL 中的 group by 再加一些函数方法的操作。聚合可以嵌套,由此可以组成复杂的操作(Bucketing聚合可以包含sub-aggregation)。

聚合计算的值可以取字段的值,也可是脚本计算的结果。查询请求体中以aggregations节点的语法定义:

"aggregations" : {                        //也可简写为 aggs
    "<aggregation_name>" : {      //聚合的名字
        "<aggregation_type>" : {     //聚合的类型
            <aggregation_body>      //聚合体:对哪些字段进行聚合
        }
        [,"meta" : {  [<meta_data_body>] } ]?                 //元
        [,"aggregations" : { [<sub_aggregation>]+ } ]?   //在聚合里面在定义子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*                      //聚合的名字
}

1、数据准备

(1) 创建员工索引employee

PUT employee
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "job": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      }
    }
  },
  "settings":{
        "index":{
            "number_of_shards":3, #分片数量
            "number_of_replicas":2  #副本数量
        }
    }
}

(2) 插入数据

POST employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}

{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}

{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}

{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}

{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}

{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}

{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}

{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}

{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}

{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}

{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}

{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}

{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}

{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}

{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}

{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
#这里有换行符

数据说明:插入的数据为员工信息,name是员工的姓名,job是员工的工种,age为员工的年龄,sal为员工的薪水,gender为员工的性别。

指标聚合

指标聚合,它是对文档进行一些权值计算(比如求所有文档某个字段求最大、最小、和、平均值),输出结果往往是文档的权值,相当于为文档添加了一些统计信息。

它基于特定字段(field)或脚本值(generated using scripts),计算聚合中文档的数值权值。数值权值聚合(注意分类只针对数值权值聚合,非数值的无此分类)输出单个权值的,也叫 single-value numeric metrics,其它生成多个权值(比如:stats)的被叫做 multi-value numeric metrics。

max min sum avg

Max Aggregation,求最大值。基于文档的某个值(可以是特定的数值型字段,也可以通过脚本计算而来),计算该值在聚合文档中的均值。

Min Aggregation,求最小值。同上

Sum Aggregation,求和。同上

Avg Aggregation,求平均数。同上

 

 

 

 

 

 

 

 

 

 

 

 

 

 

桶聚合

 

矩阵聚合

 

 

管道聚合

 

上一篇:Day7Python - 字典作业


下一篇:数据分析之pandas模块③