ElasticSearch 6.7.1操作纪录

以下操作均在 6.7.1版本中正常

文档

  • https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html《Elasticsearch 权威指南》中文版(文档已经过时,只是因为是英文,方便快速入门)
  • https://www.elastic.co/guide/en/elasticsearch/reference/6.7/getting-started.html 6.7的英文文档

basic

GET /?pretty
curl 'http://localhost:9200/?pretty'

正常返回类似如下结果

{
  "name" : "id_Cdrf",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "OVVjOYXmRLmH0_x6QnS6sw",
  "version" : {
    "number" : "6.7.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2f32220",
    "build_date" : "2019-04-02T15:59:27.961366Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

操作记录

所有操作都基于数据集合 s_flights ,详情查看文件 es-operation-datasource.md

默认查询全部实际上只返回前10条数据

数据说明

航班信息(注释后面是mapping的类型):

{
  "FlightNum" : "EN9FHUD", //航班编号 keyword
  "DestCountry" : "CA", //目的地国家名 keyword
  "OriginWeather" : "Rain",//出发地天气 keyword
  "OriginCityName" : "Detroit", // 出发地城市名 keyword
  "AvgTicketPrice" : 798.6925673856011, // 平均机票价格 float
  "DistanceMiles" : 1586.2909176475928,//出发地到目的地的英里数 float
  "FlightDelay" : true, //航班是否延迟 boolean
  "DestWeather" : "Rain", //目的地天气 keyword
  "Dest" : "Edmonton International Airport",//目的地机场 keyword
  "FlightDelayType" : "Security Delay", // 航班延迟类型 keyword
  "OriginCountry" : "US", //出发地国家名 keyword
  "dayOfWeek" : 6, //星期 integer
  "DistanceKilometers" : 2552.8877705706477, //出发地到目的地的公里数 float
  "timestamp" : "2019-04-28T06:25:17", //时间戳 date
  "DestLocation" : {//目的地坐标 geo_point
    "lat" : "53.30970001", //纬度
    "lon" : "-113.5800018" //经度
  },
  "DestAirportID" : "CYEG", //目的地机场ID keyword
  "Carrier" : "Kibana Airlines", //航空公司名 keyword
  "Cancelled" : false, //是否取消航班 boolean
  "FlightTimeMin" : 451.3759823515883, //航班飞行最小分钟数 float
  "Origin" : "Detroit Metropolitan Wayne County Airport",//出发地机场 keyword
  "OriginLocation" : {//出发地坐标 geo_point
    "lat" : "42.21239853",//纬度
    "lon" : "-83.35340118"//经度
  },
  "DestRegion" : "CA-AB",//目的地区域 keyword
  "OriginAirportID" : "DTW", //出发地的机场ID keyword
  "OriginRegion" : "US-MI", //出发地区域 keyword
  "DestCityName" : "Edmonton", //目的地城市名称 keyword
  "FlightTimeHour" : 7.522933039193138, //航班飞行小时数 keyword
  "FlightDelayMin" : 255 //航班延迟最小分钟数 integer
}

操作记录

由于数据体数据太多,所以需要对数据返回进行如下方式过滤

如下只返回定义的五个属性

轻量查询方式:

GET /s_flights/_doc/_search?_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

请求体方式:

GET /s_flights/_doc/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ]
}

添加数据

mapping定义的数据不需要所有都填写

添加单独一条数据

PUT /s_flights/_doc/100
{
  "FlightNum": "C12345A",
  "Origin": "重庆",
  "Dest": "北京"
}

添加单独一条数据自动生成 id

POST /s_flights/_doc/
{
  "FlightNum": "C12345A",
  "Origin": "重庆",
  "Dest": "北京"
}

确保是添加新文档,而不是更新文档

PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k?op_type=create
{
  "FlightNum": "C12345A",
  "Origin": "重庆",
  "Dest": "北京"
}
PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k/_create
{
  "FlightNum": "C12345A",
  "Origin": "重庆",
  "Dest": "北京"
}

doc as upsert

POST /s_flights/_doc/1
{
  "doc": {
    "FlightNum": "C123333345A",
    "Origin": "重222庆",
    "Dest": "北京",
    "bb":11
  },
  "doc_as_upsert": true
}

存在属性再更新就会返回 noop(no operation) 操作

POST /s_flights/_doc/1/_update
{
   "doc" : {
      "moreInfo":{
        "counter":1
      }
   }
}

批量插入数据

POST _bulk
{"create":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"FlightNum":"C12345B","Origin":"重庆","Dest":"上海"}
{"create":{"_index":"s_flights","_type":"_doc","_id":"102"}}
{"FlightNum":"C12345C","Origin":"重庆","Dest":"武汉"}

删除数据

删除数据_id为100的

DELETE /s_flights/_doc/100

批量删除数据

POST _bulk
{"delete":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"delete":{"_index":"s_flights","_type":"_doc","_id":"102"}}

修改(更新)数据

https://www.elastic.co/guide/en/elasticsearch/reference/6.7/docs-update.html

更新或添加数据

POST /s_flights/_doc/1/_update
{
   "doc" : {
      "moreInfo":{
        "tags" : [ "airline","flight","aeroplane","airplane"],
        "counter":1,
        "memo_1":""
      }
   }
}
GET /s_flights/_doc/1?_source=moreInfo

脚本方式更新字段值:

POST s_flights/_doc/1/_update
{
   "script" : "ctx._source.moreInfo.counter+=1"
}
POST s_flights/_doc/1/_update
{
   "script" : "ctx._source.moreInfo.memo_1='good time'"
}
GET /s_flights/_doc/1?_source=moreInfo

如果数据不存在,则先使用upsert创建文档,然后再次执行,则会执行脚本进行递增。

DELETE s_flights/_doc/1
POST s_flights/_doc/1/_update
{
  "script": {
    "source": "ctx._source.view_counter += params.count",
    "lang": "painless",
    "params": {
      "count": 1
    }
  },
  "upsert": {
    "moreInfo": {
      "tags": [
        "airline",
        "flight",
        "aeroplane",
        "airplane"
      ],
      "counter": 1,
      "memo_1": ""
    },
    "view_counter": 1
  }
}
GET /s_flights/_doc/1

flag scripted_upsert 的作用,如果数据不存在,则先使用upsert创建文档,然后执行脚本

POST s_flights/_doc/1/_update
{
  "scripted_upsert":true,
  "script": {
    "source": "ctx._source.view_counter += params.count",
    "lang": "painless",
    "params": {
      "count": 1
    }
  },
  "upsert": {
    "moreInfo": {
      "tags": [
        "airline",
        "flight",
        "aeroplane",
        "airplane"
      ],
      "counter": 1,
      "memo_1": ""
    },
    "view_counter": 1
  }
}
GET /s_flights/_doc/1

通过scripts更新文档数据

POST s_flights/_doc/1/_update
{
  "script": {
    "source": "ctx._source.moreInfo.counter_status = ctx._source.moreInfo.counter === params.count ? 'isEnough' : params.count",
    "params": {
      "count": 10
    },
    "lang": "painless"
  }
}
GET /s_flights/_doc/1?_source=moreInfo

查询数据

https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

https://www.cnblogs.com/ghj1976/p/5293250.html

https://donlianli.iteye.com/blog/2094305

https://blog.csdn.net/weixin_43430036/article/details/83272018

轻量查询

获取所有数据(默认只查询10条出来)

GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

获取id为1的数据

GET /s_flights/_doc/1?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

获取数据请求头,返回的是状态码,以此判断数据是否存在

HEAD /s_flights/_doc/1

查询出发国家是US的数据

GET /s_flights/_search?q=OriginCountry:US&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

查询出发国家是US CN的数据

GET /s_flights/_search?q=OriginCountry:US+CN&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

查询mapping

GET /s_flights/_mapping/_doc

查看集群健康

GET /_cluster/health

通配符查询(wildcards query)

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query

查询出发地国家名以C开头的

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "wildcard": {
      "OriginCountry": {
        "value": "C*"
      }
    }
  }
}

查询航班编号值是 F开头 M9结尾的

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "wildcard": {
      "FlightNum": {
        "value": "F????M9"
      }
    }
  }
}

短语查询(term query)

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

查询 FlightNum 为 FFEVPM9的结果

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "term": {
      "FlightNum": {
        "value": "FFEVPM9"
      }
    }
  }
}

多短语查询(terms query)

查询 FlightNum 为 6DJ0DZM ILXJVIF 的结果

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "terms": {
      "FlightNum": ["6DJ0DZM","ILXJVIF"]
    }
  }
}

term set query

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html

(范围查询)range query

查询飞行距离在100km(包含100km)到200km(包含200km)之间的结果

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "DistanceKilometers"
  ],
  "query": {
    "range": {
      "DistanceKilometers": {
        "gte": 100,
        "lte": 200
      }
    }
  }
}

exists query

prefix query

regexp query

fuzzy query

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin"
  ],
  "query": {
    "fuzzy": {
      "Origin": "bai"
    }
  }
}

type query

ids query

请求体查询(ad-hoc)

查询所有数据

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match_all": {}
  }
}

上面的等价于

GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry

匹配查询出发国家是US的数据

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "OriginCountry": "US"
    }
  }
}

查询出发地包含Shanghai Tokyo的数据 (Origin是text类型,可以进行分词查询)

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin": "Shanghai Tokyo"
    }
  }
}

匹配查询出发地包含Shanghai Tokyo的数据 (Origin此处被当做是keyword类型,不可以进行分词查询,查询不出结果)

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin.keyword": "Shanghai Tokyo"
    }
  }
}

匹配查询出发地包含Shanghai Tokyo的数据 (Origin此处被当做是keyword类型,不可以进行分词查询),可以对整个词作为关键字进行查询,有结果

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin.keyword": "Shanghai Hongqiao International Airport"
    }
  }
}

匹配短语查询出发地包含Shanghai Hongqiao International的数据 (Origin是text类型,可以进行分词查询)

正常查询,查出一堆数据

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin": "Shanghai Hongqiao International"
    }
  }
}

匹配短语查询,只能查出一条

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match_phrase": {
      "Origin": "Shanghai Hongqiao International"
    }
  }
}

高亮查询,匹配短语查询出发地包含Shanghai Hongqiao International的数据 (Origin是text类型,可以进行分词查询),并对查询结果进行高亮(即对返回结果添加额外的标签)

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin": "Shanghai Hongqiao International"
    }
  }
  , "highlight": {
    "fields": {
      "Origin":{}
    }
  }
}

自定义highlight 标签

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin": "Shanghai Hongqiao International"
    }
  }
  , "highlight": {
    "fields": {
      "Origin":{
        "pre_tags": "<span class='highlight '>",
        "post_tags": "</span>"
      }
    }
  }
}

高亮标签设置内部优先。

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "match": {
      "Origin": "Shanghai Hongqiao International"
    }
  },
  "highlight": {
    "fields": {
      "Origin": {
        "pre_tags": "<span class='hightlight-origin'>",
        "post_tags": "</span>"
      }
    },
    "pre_tags": "<span class='hightlight'>",
    "post_tags": "</span>"
  }
}

查询出发国家是US,同时航班飞行最小分钟数小于100分钟的数据

GET /s_flights/_search
{
  "_source": [
    "FlightNum",
    "Origin",
    "OriginCountry",
    "Dest",
    "DestCountry"
  ],
  "query": {
    "bool": {
      "must": {
        "match": {
          "OriginCountry": "US"
        }
      },
      "filter": {
        "range": {
          "FlightTimeMin": {
            "lt": 100
          }
        }
      }
    }
  }
}

查询 OriginCountry 为USA和UK的航班记录

GET /s_flights/_search?q=OriginCountry:US+CA

GET /s_flights/_search
{
  "query": {
    "terms": {
      "OriginCountry": ["US","CA"]
    }
  }
}

查询结果只显示字段 OriginCountryDestCountry

GET /s_flights/_search
{
  "_source": {
        "includes": [ "OriginCountry", "DestCountry" ]
    },
  "query": {
    "match_all": {}
  }
}

GET /s_flights/_search
{
  "_source":[ "OriginCountry", "DestCountry" ],
  "query": {
    "match_all": {}
  }
}

查询结果只显示字段 OriginCountry

GET /s_flights/_search
{
  "_source":"OriginCountry",
  "query": {
    "match_all": {}
  }
}

不返回 _source

GET /s_flights/_search
{
  "_source":false,
  "query": {
    "match_all": {}
  }
}

查询出出发地是"US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计

GET /s_flights/_search
{
  "_source":false,
  "query": {
    "terms": {
      "OriginCountry": [
        "US",
        "NL",
        "JP",
        "CN"
      ]
    }
  },
  "aggs": {
    "all_origin": {
      "terms": {
        "field": "OriginCountry"
      },
      "aggs": {
        "all_dest": {
          "terms": {
            "field": "DestCountry"
          }
        }
      }
    }
  }
}

查询出出发地是"US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计,同时统计最小、最大和平均里程数

GET /s_flights/_search
{
  "_source": "OriginCountry", 
  "query": {
    "terms": {
      "OriginCountry": [
        "US",
        "NL",
        "JP",
        "CN"
      ]
    }
  },
  "aggs": {
    "all_origin": {
      "terms": {
        "field": "OriginCountry"
      },
      "aggs": {
        "all_dest": {
          "terms": {
            "field": "DestCountry"
          }
        },
        "minDistanceKilometers": {
          "min": {
            "field": "DistanceKilometers"
          }
        },
        "maxDistanceKilometers": {
          "max": {
            "field": "DistanceKilometers"
          }
        },
        "avgDistanceKilometers": {
          "avg": {
            "field": "DistanceKilometers"
          }
        }
      }
    },
    "minDistanceKilometers": {
      "min": {
        "field": "DistanceKilometers"
      }
    },
    "maxDistanceKilometers": {
      "max": {
        "field": "DistanceKilometers"
      }
    },
    "avgDistanceKilometers": {
      "avg": {
        "field": "DistanceKilometers"
      }
    }
  }
}

查询出发地和目的地都是US的数据

GET /s_flights/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "OriginCountry": "US"
          }
        },
        {
          "match": {
            "DestCountry": "US"
          }
        }
      ]
    }
  }
}

查询无延机航班,时间范围在 2019-04-28 一天内,同时出发地国家为 "US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计

GET /s_flights/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "match": {
                  "FlightDelayType": "No Delay"
                }
              }
            ]
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2019-04-28 00:00:00",
              "lte": "2019-04-29 00:00:00",
              "time_zone": "+08:00",
              "format": "yyyy-MM-dd HH:mm:ss"
            }
          }
        },
        {
          "terms": {
            "OriginCountry": [
              "US",
              "NL",
              "JP"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "all_origin": {
      "terms": {
        "field": "OriginCountry"
      },
      "aggs": {
        "all_dest": {
          "terms": {
            "field": "DestCountry"
          }
        }
      }
    }
  }
}

查询匹配天气,匹配 机场id ,最小飞行时间范围,不匹配 DestCountry I 开头 E结尾的

注意:官方数据这个里面如小时数是keyword类型,也就是string类型,所以会导致查询范围会出现有时候又数据,有时候没有数据,range应该针对number date。

GET /s_flights/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "DestWeather": "Clear"
                }
              },
              {
                "match_phrase": {
                  "DestWeather": "Sunny"
                }
              }
            ]
          }
        },
        {
          "terms": {
            "OriginAirportID": [
              "SHA",
              "DWC"
            ]
          }
        },
        {
          "range": {
            "FlightTimeMin": {
              "gte": 100,
              "lte": 900
            }
          }
        }
      ],
      "must_not": [
        {
          "wildcard": {
            "DestCountry": "*E"
          }
        },
        {
          "wildcard": {
            "DestCountry": {
              "value": "I*"
            }
          }
        }
      ]
    }
  }
}

_mget查询

获取id 为 1 2的数据

POST /s_flights/_doc/_mget
{
  "ids":[1,2]
}

获取id 为 1 3的数据

POST /s_flights/_doc/_mget
{
  "docs": [
    {
      "_id": 1
    },
    {
      "_id": 3
    }
  ]
}

官方教程4 分页

GET /_search

GET /_search?timeout=10ms

# 在所有的索引中搜索所有的类型
GET /_search

# 在 gb 索引中搜索所有的类型
GET /gb/_search

# 在 gb 和 us 索引中搜索所有的文档
GET /gb,us/_search

# 在任何以 g 或者 u 开头的索引中搜索所有的类型
GET /g*,u*/_search

#在 gb 索引中搜索 user 类型
/gb/user/_search

#在 gb 和 us 索引中搜索 user 和 tweet 类型
/gb,us/user,tweet/_search

#在所有的索引中搜索 user 和 tweet 类型
/_all/user,tweet/_search


GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10

和 SQL 使用 LIMIT 关键字返回单个 page 结果的方法相同,Elasticsearch 接受 from 和 size 参数:

# 显示应该返回的结果数量,默认是 10
size

# 显示应该跳过的初始结果数量,默认是 0
from

在分布式系统中深度分页

理解为什么深度分页是有问题的,我们可以假设在一个有 5 个主分片的索引中搜索。 当我们请求结果的第一页(结果从 1 到 10 ),每一个分片产生前 10 的结果,并且返回给 协调节点 ,协调节点对 50 个结果排序得到全部结果的前 10 个。

现在假设我们请求第 1000 页--结果从 10001 到 10010 。所有都以相同的方式工作除了每个分片不得不产生前10010个结果以外。 然后协调节点对全部 50050 个结果排序最后丢弃掉这些结果中的 50040 个结果。

可以看到,在分布式系统中,对结果排序的成本随分页的深度成指数上升。这就是 web 搜索引擎对任何查询都不要返回超过 1000 个结果的原因。

https://www.elastic.co/guide/cn/elasticsearch/guide/current/pagination.html

批量写入操作 _bulk

https://blog.csdn.net/u010454030/article/details/79872003

bulk api可以在单个请求中一次执行多个索引或者删除操作,使用这种方式可以极大的提升索引性能。

批量操作数据需要在一行

两行数据构成了一次操作,第一行是操作类型可以index,create,update,或者delete,第二行就是我们的可选的数据体,使用这种方式批量插入的时候,我们需要设置的它的Content-Type为application/json

针对不同的操作类型,第二行里面的可选的数据体是不一样的,如下:

(1)index 和 create 第二行是source数据体
(2)delete 没有第二行
(3)update 第二行可以是partial doc,upsert或者是script

我们可以将我们的操作直接写入到一个文本文件中,然后使用curl命令把它发送到服务端:

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_Document_Metadata.html

https://www.cnblogs.com/wangzhuxing/p/9351245.html

https://blog.51cto.com/13630803/2162641?source=dra

https://elasticsearch.cn/question/5340

https://blog.csdn.net/jianjun200607/article/details/51262976/

https://blog.csdn.net/huwei2003/article/details/47004745

https://www.cnblogs.com/wulaiwei/p/9319821.html

https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/elasticsearch-net-getting-started.html

https://www.cnblogs.com/wulaiwei/p/9319821.html

https://www.cnblogs.com/Angle-Louis/p/4218678.html

上一篇:如何在不同的MySQL表中指定具有不同名称的连接变量


下一篇:MuleSoft系列(二) 使用Flow Designer创建Mule应用程序