以下操作均在 6.7.1版本中正常
文档
- https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html《Elasticsearch 权威指南》中文版(文档已经过时,只是因为是英文,方便快速入门)
- https://www.elastic.co/guide/en/elasticsearch/reference/6.7/getting-started.html 6.7的英文文档
basic
GET /?pretty
curl 'http://localhost:9200/?pretty'
正常返回类似如下结果
{
"name" : "id_Cdrf",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "OVVjOYXmRLmH0_x6QnS6sw",
"version" : {
"number" : "6.7.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f32220",
"build_date" : "2019-04-02T15:59:27.961366Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
操作记录
所有操作都基于数据集合 s_flights
,详情查看文件 es-operation-datasource.md
。
默认查询全部实际上只返回前10条数据
数据说明
航班信息(注释后面是mapping的类型):
{
"FlightNum" : "EN9FHUD", //航班编号 keyword
"DestCountry" : "CA", //目的地国家名 keyword
"OriginWeather" : "Rain",//出发地天气 keyword
"OriginCityName" : "Detroit", // 出发地城市名 keyword
"AvgTicketPrice" : 798.6925673856011, // 平均机票价格 float
"DistanceMiles" : 1586.2909176475928,//出发地到目的地的英里数 float
"FlightDelay" : true, //航班是否延迟 boolean
"DestWeather" : "Rain", //目的地天气 keyword
"Dest" : "Edmonton International Airport",//目的地机场 keyword
"FlightDelayType" : "Security Delay", // 航班延迟类型 keyword
"OriginCountry" : "US", //出发地国家名 keyword
"dayOfWeek" : 6, //星期 integer
"DistanceKilometers" : 2552.8877705706477, //出发地到目的地的公里数 float
"timestamp" : "2019-04-28T06:25:17", //时间戳 date
"DestLocation" : {//目的地坐标 geo_point
"lat" : "53.30970001", //纬度
"lon" : "-113.5800018" //经度
},
"DestAirportID" : "CYEG", //目的地机场ID keyword
"Carrier" : "Kibana Airlines", //航空公司名 keyword
"Cancelled" : false, //是否取消航班 boolean
"FlightTimeMin" : 451.3759823515883, //航班飞行最小分钟数 float
"Origin" : "Detroit Metropolitan Wayne County Airport",//出发地机场 keyword
"OriginLocation" : {//出发地坐标 geo_point
"lat" : "42.21239853",//纬度
"lon" : "-83.35340118"//经度
},
"DestRegion" : "CA-AB",//目的地区域 keyword
"OriginAirportID" : "DTW", //出发地的机场ID keyword
"OriginRegion" : "US-MI", //出发地区域 keyword
"DestCityName" : "Edmonton", //目的地城市名称 keyword
"FlightTimeHour" : 7.522933039193138, //航班飞行小时数 keyword
"FlightDelayMin" : 255 //航班延迟最小分钟数 integer
}
操作记录
由于数据体数据太多,所以需要对数据返回进行如下方式过滤
如下只返回定义的五个属性
轻量查询方式:
GET /s_flights/_doc/_search?_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
请求体方式:
GET /s_flights/_doc/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
]
}
添加数据
mapping定义的数据不需要所有都填写
添加单独一条数据
PUT /s_flights/_doc/100
{
"FlightNum": "C12345A",
"Origin": "重庆",
"Dest": "北京"
}
添加单独一条数据自动生成 id
POST /s_flights/_doc/
{
"FlightNum": "C12345A",
"Origin": "重庆",
"Dest": "北京"
}
确保是添加新文档,而不是更新文档
PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k?op_type=create
{
"FlightNum": "C12345A",
"Origin": "重庆",
"Dest": "北京"
}
PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k/_create
{
"FlightNum": "C12345A",
"Origin": "重庆",
"Dest": "北京"
}
doc as upsert
POST /s_flights/_doc/1
{
"doc": {
"FlightNum": "C123333345A",
"Origin": "重222庆",
"Dest": "北京",
"bb":11
},
"doc_as_upsert": true
}
存在属性再更新就会返回 noop(no operation) 操作
POST /s_flights/_doc/1/_update
{
"doc" : {
"moreInfo":{
"counter":1
}
}
}
批量插入数据
POST _bulk
{"create":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"FlightNum":"C12345B","Origin":"重庆","Dest":"上海"}
{"create":{"_index":"s_flights","_type":"_doc","_id":"102"}}
{"FlightNum":"C12345C","Origin":"重庆","Dest":"武汉"}
删除数据
删除数据_id
为100的
DELETE /s_flights/_doc/100
批量删除数据
POST _bulk
{"delete":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"delete":{"_index":"s_flights","_type":"_doc","_id":"102"}}
修改(更新)数据
https://www.elastic.co/guide/en/elasticsearch/reference/6.7/docs-update.html
更新或添加数据
POST /s_flights/_doc/1/_update
{
"doc" : {
"moreInfo":{
"tags" : [ "airline","flight","aeroplane","airplane"],
"counter":1,
"memo_1":""
}
}
}
GET /s_flights/_doc/1?_source=moreInfo
脚本方式更新字段值:
POST s_flights/_doc/1/_update
{
"script" : "ctx._source.moreInfo.counter+=1"
}
POST s_flights/_doc/1/_update
{
"script" : "ctx._source.moreInfo.memo_1='good time'"
}
GET /s_flights/_doc/1?_source=moreInfo
如果数据不存在,则先使用upsert
创建文档,然后再次执行,则会执行脚本进行递增。
DELETE s_flights/_doc/1
POST s_flights/_doc/1/_update
{
"script": {
"source": "ctx._source.view_counter += params.count",
"lang": "painless",
"params": {
"count": 1
}
},
"upsert": {
"moreInfo": {
"tags": [
"airline",
"flight",
"aeroplane",
"airplane"
],
"counter": 1,
"memo_1": ""
},
"view_counter": 1
}
}
GET /s_flights/_doc/1
flag scripted_upsert 的作用,如果数据不存在,则先使用upsert
创建文档,然后执行脚本
POST s_flights/_doc/1/_update
{
"scripted_upsert":true,
"script": {
"source": "ctx._source.view_counter += params.count",
"lang": "painless",
"params": {
"count": 1
}
},
"upsert": {
"moreInfo": {
"tags": [
"airline",
"flight",
"aeroplane",
"airplane"
],
"counter": 1,
"memo_1": ""
},
"view_counter": 1
}
}
GET /s_flights/_doc/1
通过scripts更新文档数据
POST s_flights/_doc/1/_update
{
"script": {
"source": "ctx._source.moreInfo.counter_status = ctx._source.moreInfo.counter === params.count ? 'isEnough' : params.count",
"params": {
"count": 10
},
"lang": "painless"
}
}
GET /s_flights/_doc/1?_source=moreInfo
查询数据
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html
https://www.cnblogs.com/ghj1976/p/5293250.html
https://donlianli.iteye.com/blog/2094305
https://blog.csdn.net/weixin_43430036/article/details/83272018
轻量查询
获取所有数据(默认只查询10条出来)
GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
获取id为1的数据
GET /s_flights/_doc/1?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
获取数据请求头,返回的是状态码,以此判断数据是否存在
HEAD /s_flights/_doc/1
查询出发国家是US的数据
GET /s_flights/_search?q=OriginCountry:US&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
查询出发国家是US CN的数据
GET /s_flights/_search?q=OriginCountry:US+CN&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
查询mapping
GET /s_flights/_mapping/_doc
查看集群健康
GET /_cluster/health
通配符查询(wildcards query)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query
查询出发地国家名以C开头的
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"wildcard": {
"OriginCountry": {
"value": "C*"
}
}
}
}
查询航班编号值是 F开头 M9结尾的
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"wildcard": {
"FlightNum": {
"value": "F????M9"
}
}
}
}
短语查询(term query)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
查询 FlightNum 为 FFEVPM9的结果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"term": {
"FlightNum": {
"value": "FFEVPM9"
}
}
}
}
多短语查询(terms query)
查询 FlightNum 为 6DJ0DZM ILXJVIF 的结果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"terms": {
"FlightNum": ["6DJ0DZM","ILXJVIF"]
}
}
}
term set query
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html
(范围查询)range query
查询飞行距离在100km(包含100km)到200km(包含200km)之间的结果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"DistanceKilometers"
],
"query": {
"range": {
"DistanceKilometers": {
"gte": 100,
"lte": 200
}
}
}
}
exists query
prefix query
regexp query
fuzzy query
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin"
],
"query": {
"fuzzy": {
"Origin": "bai"
}
}
}
type query
ids query
请求体查询(ad-hoc)
查询所有数据
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match_all": {}
}
}
上面的等价于
GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
匹配查询出发国家是US的数据
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"OriginCountry": "US"
}
}
}
查询出发地包含Shanghai Tokyo的数据 (Origin是text类型,可以进行分词查询)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Tokyo"
}
}
}
匹配查询出发地包含Shanghai Tokyo的数据 (Origin此处被当做是keyword类型,不可以进行分词查询,查询不出结果)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin.keyword": "Shanghai Tokyo"
}
}
}
匹配查询出发地包含Shanghai Tokyo的数据 (Origin此处被当做是keyword类型,不可以进行分词查询),可以对整个词作为关键字进行查询,有结果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin.keyword": "Shanghai Hongqiao International Airport"
}
}
}
匹配短语查询出发地包含Shanghai Hongqiao International的数据 (Origin是text类型,可以进行分词查询)
正常查询,查出一堆数据
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
}
匹配短语查询,只能查出一条
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match_phrase": {
"Origin": "Shanghai Hongqiao International"
}
}
}
高亮查询,匹配短语查询出发地包含Shanghai Hongqiao International的数据 (Origin是text类型,可以进行分词查询),并对查询结果进行高亮(即对返回结果添加额外的标签)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
, "highlight": {
"fields": {
"Origin":{}
}
}
}
自定义highlight 标签
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
, "highlight": {
"fields": {
"Origin":{
"pre_tags": "<span class='highlight '>",
"post_tags": "</span>"
}
}
}
}
高亮标签设置内部优先。
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
},
"highlight": {
"fields": {
"Origin": {
"pre_tags": "<span class='hightlight-origin'>",
"post_tags": "</span>"
}
},
"pre_tags": "<span class='hightlight'>",
"post_tags": "</span>"
}
}
查询出发国家是US,同时航班飞行最小分钟数小于100分钟的数据
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"bool": {
"must": {
"match": {
"OriginCountry": "US"
}
},
"filter": {
"range": {
"FlightTimeMin": {
"lt": 100
}
}
}
}
}
}
查询 OriginCountry
为USA和UK的航班记录
GET /s_flights/_search?q=OriginCountry:US+CA
GET /s_flights/_search
{
"query": {
"terms": {
"OriginCountry": ["US","CA"]
}
}
}
查询结果只显示字段 OriginCountry
和 DestCountry
GET /s_flights/_search
{
"_source": {
"includes": [ "OriginCountry", "DestCountry" ]
},
"query": {
"match_all": {}
}
}
GET /s_flights/_search
{
"_source":[ "OriginCountry", "DestCountry" ],
"query": {
"match_all": {}
}
}
查询结果只显示字段 OriginCountry
GET /s_flights/_search
{
"_source":"OriginCountry",
"query": {
"match_all": {}
}
}
不返回 _source
GET /s_flights/_search
{
"_source":false,
"query": {
"match_all": {}
}
}
查询出出发地是"US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计
GET /s_flights/_search
{
"_source":false,
"query": {
"terms": {
"OriginCountry": [
"US",
"NL",
"JP",
"CN"
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
}
}
}
}
}
查询出出发地是"US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计,同时统计最小、最大和平均里程数
GET /s_flights/_search
{
"_source": "OriginCountry",
"query": {
"terms": {
"OriginCountry": [
"US",
"NL",
"JP",
"CN"
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
},
"minDistanceKilometers": {
"min": {
"field": "DistanceKilometers"
}
},
"maxDistanceKilometers": {
"max": {
"field": "DistanceKilometers"
}
},
"avgDistanceKilometers": {
"avg": {
"field": "DistanceKilometers"
}
}
}
},
"minDistanceKilometers": {
"min": {
"field": "DistanceKilometers"
}
},
"maxDistanceKilometers": {
"max": {
"field": "DistanceKilometers"
}
},
"avgDistanceKilometers": {
"avg": {
"field": "DistanceKilometers"
}
}
}
}
查询出发地和目的地都是US的数据
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"OriginCountry": "US"
}
},
{
"match": {
"DestCountry": "US"
}
}
]
}
}
}
查询无延机航班,时间范围在 2019-04-28
一天内,同时出发地国家为 "US","NL", "JP" 的航班统计同时查询出每个出发地的不同目的地航班统计
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match": {
"FlightDelayType": "No Delay"
}
}
]
}
},
{
"range": {
"timestamp": {
"gte": "2019-04-28 00:00:00",
"lte": "2019-04-29 00:00:00",
"time_zone": "+08:00",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"OriginCountry": [
"US",
"NL",
"JP"
]
}
}
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
}
}
}
}
}
查询匹配天气,匹配 机场id ,最小飞行时间范围,不匹配 DestCountry I 开头 E结尾的
注意:官方数据这个里面如小时数是keyword类型,也就是string类型,所以会导致查询范围会出现有时候又数据,有时候没有数据,range应该针对number date。
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"DestWeather": "Clear"
}
},
{
"match_phrase": {
"DestWeather": "Sunny"
}
}
]
}
},
{
"terms": {
"OriginAirportID": [
"SHA",
"DWC"
]
}
},
{
"range": {
"FlightTimeMin": {
"gte": 100,
"lte": 900
}
}
}
],
"must_not": [
{
"wildcard": {
"DestCountry": "*E"
}
},
{
"wildcard": {
"DestCountry": {
"value": "I*"
}
}
}
]
}
}
}
_mget查询
获取id 为 1 2的数据
POST /s_flights/_doc/_mget
{
"ids":[1,2]
}
获取id 为 1 3的数据
POST /s_flights/_doc/_mget
{
"docs": [
{
"_id": 1
},
{
"_id": 3
}
]
}
官方教程4 分页
GET /_search
GET /_search?timeout=10ms
# 在所有的索引中搜索所有的类型
GET /_search
# 在 gb 索引中搜索所有的类型
GET /gb/_search
# 在 gb 和 us 索引中搜索所有的文档
GET /gb,us/_search
# 在任何以 g 或者 u 开头的索引中搜索所有的类型
GET /g*,u*/_search
#在 gb 索引中搜索 user 类型
/gb/user/_search
#在 gb 和 us 索引中搜索 user 和 tweet 类型
/gb,us/user,tweet/_search
#在所有的索引中搜索 user 和 tweet 类型
/_all/user,tweet/_search
GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
和 SQL 使用 LIMIT 关键字返回单个 page 结果的方法相同,Elasticsearch 接受 from 和 size 参数:
# 显示应该返回的结果数量,默认是 10
size
# 显示应该跳过的初始结果数量,默认是 0
from
在分布式系统中深度分页
理解为什么深度分页是有问题的,我们可以假设在一个有 5 个主分片的索引中搜索。 当我们请求结果的第一页(结果从 1 到 10 ),每一个分片产生前 10 的结果,并且返回给 协调节点 ,协调节点对 50 个结果排序得到全部结果的前 10 个。
现在假设我们请求第 1000 页--结果从 10001 到 10010 。所有都以相同的方式工作除了每个分片不得不产生前10010个结果以外。 然后协调节点对全部 50050 个结果排序最后丢弃掉这些结果中的 50040 个结果。
可以看到,在分布式系统中,对结果排序的成本随分页的深度成指数上升。这就是 web 搜索引擎对任何查询都不要返回超过 1000 个结果的原因。
https://www.elastic.co/guide/cn/elasticsearch/guide/current/pagination.html
批量写入操作 _bulk
https://blog.csdn.net/u010454030/article/details/79872003
bulk api可以在单个请求中一次执行多个索引或者删除操作,使用这种方式可以极大的提升索引性能。
批量操作数据需要在一行
两行数据构成了一次操作,第一行是操作类型可以index,create,update,或者delete,第二行就是我们的可选的数据体,使用这种方式批量插入的时候,我们需要设置的它的Content-Type为application/json
。
针对不同的操作类型,第二行里面的可选的数据体是不一样的,如下:
(1)index 和 create 第二行是source数据体
(2)delete 没有第二行
(3)update 第二行可以是partial doc,upsert或者是script
我们可以将我们的操作直接写入到一个文本文件中,然后使用curl命令把它发送到服务端:
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_Document_Metadata.html
https://www.cnblogs.com/wangzhuxing/p/9351245.html
https://blog.51cto.com/13630803/2162641?source=dra
https://elasticsearch.cn/question/5340
https://blog.csdn.net/jianjun200607/article/details/51262976/
https://blog.csdn.net/huwei2003/article/details/47004745
https://www.cnblogs.com/wulaiwei/p/9319821.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/elasticsearch-net-getting-started.html
https://www.cnblogs.com/wulaiwei/p/9319821.html
https://www.cnblogs.com/Angle-Louis/p/4218678.html