ElasticSearch笔记-聚合查询

概念

bucket:一个数据分组
metric:对一个数据分组进行统计,如求平均值、最大值

案例

统计数量

统计每个年龄有多少人,根据年龄分组,可以得到每个bucket中的数量,这仅是一个bucket操作,doc_count是bucket操作默认执行的一个内置metric

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "age_group": {
      "terms": {
        "field": "age"
      }
    }
  }
}

统计平均数

统计扳机的平均年龄,根据班级分组,再对每个bucket执行一个metric聚合统计操作

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "sex_group": {
      "terms": {
        "field": "classid"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

下钻

下钻的意思是在分组的基础上在进行分组,比如在班级分组的基础上再对年龄分组,最后对每个最小粒度的分组执行聚合分析操作,年龄分组的基础上计算平均分数

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "sex_group": {
      "terms": {
        "field": "classid"
      },
      "aggs": {
        "age_group": {
          "terms": {
            "field": "age"
          },
          "aggs":{
            "avg_fenshu":{
              "avg": {
                "field": "fenshu"
              }
            }
          }
        }
      }
    }
  }
}

查找班级最高分、最低分、平均分

需要使用多个metric,对bucket分析

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "class_group":{
      "terms": {
        "field": "classid"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_age":{
          "max": {
            "field": "age"
          }
        },
        "min_age":{
          "min":{
            "field": "age"
          }
        }
      }
    }
  }
}

按年龄区间统计平均分数

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "age_group": {
      "histogram": {
        "field": "age",
        "interval": 1
      },
      "aggs": {
        "fenshu": {
          "avg": {
            "field": "fenshu"
          }
        }
      }
    }
  }
}

按时间区间统计平均分数

GET index5/people/_search
{
  "size":0,
  "aggs": {
    "age_group": {
      "date_histogram": {
        "field": "birthday",
        "interval": "year",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "1985-01-01",
          "max": "2020-01-01"
        }
      },
      "aggs": {
        "avg_fenshu": {
          "avg": {
            "field": "fenshu"
          }
        }
      }
    }    
  }
}

global bucket

global bucket可排除查询条件,用于整体与按条件查询结果对比
查询1班的平均成绩与全部班级的平均成绩

GET index5/people/_search
{
  "size": 0,
  "query": {
    "match": {
      "classid": 1
    }
  }, 
  "aggs": {
    "class1_avg_fenshu": {
      "avg": {
        "field": "fenshu"
      }
    },
    "all":{
      "global": {},
      "aggs": {
        "global_avg_fenshu": {
          "avg": {
            "field": "fenshu"
          }
        }
      }
    }
  }
}

过滤bucket

aggs.filter,针对聚合过滤
如果filter放query里,是全局的。如果想对统计结果过滤,需要将filter放在aggs中
搜索1班90年出生学生的平均成绩

GET index5/people/_search
{
  "size": 0,
  "query": {
    "match": {
      "classid": 1
    }
  }, 
  "aggs": {
    "recent_90":{
      "filter": {
        "range": {
          "birthday": {
            "gte": "1990/01/01",
            "lte": "1991/01/01"
          }
        }
      },
      "aggs": {
        "avg_fenshu": {
          "avg": {
            "field": "fenshu"
          }
        }
      }
    }
  }
}

排序

对分析的结果排序
按班级分组,并按平均分数排序

GET index5/people/_search
{
  "size": 0,
  "aggs": {
    "class_group": {
      "terms": {
        "field": "classid",
        "order": {
          "avg_fenshu": "desc"
        }
      },
      "aggs": {
        "avg_fenshu": {
          "avg": {
            "field": "fenshu"
          }
        }
      }
    }
  }
}

去重

对每个bucket中指定field进行去重,取去重后的count,类似count(distinct)

GET index5/people/_search
{
  "size": 0,
  "aggs": {
    "years": {
      "date_histogram": {
        "field": "birthday",
        "interval": "year",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "1985-01-01",
          "max": "2020-01-01"
        }
      },
      "aggs": {
        "age_distinct": {
          "cardinality": {
            "field": "age"
          }
          }
      }
    }
  }
}
上一篇:【数据分析师_02_SQL+MySQL】016_MySQL的数据汇聚AVG,COUNT,MAX,MIN,SUM


下一篇:Linux入门(十四)