· 更多精彩内容,请下载阅读全本《Elastic Stack实战手册》
创作人:李增胜
业务背景
在 TO B 行业,对商品的搜索展示,是有一定业务要求的,例如:存在合作关系的买家和供应商才能看到供应商店铺的商品,不存在合作关系的买家则不展示商品。另外,有些商品对客户甲展示一种价格,对客户乙则展示另外一种价格,从而区分不同的会员、分组对商品价格的区别。
一句话总结:TO B 行业的商品销售具有一定封闭性、特殊性。后续例子均在此背景下展开描述,以方便大家更加贴近业务场景来熟悉 Elasticsearch 对文档、索引、查询的一系列操作。
本文采用 IK 做分词器,下载的 IK 分词器版本必须和 Elasticsearch 版本一致IK下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases
- 在 Elasticsearch 的安装目录的 Plugins 目录下新建 IK 文件夹,然后将下载的 IK 安装包解压到此目录下。
- 重启 Elasticsearch 即可。
定义 Mapping
商品字段描述如下:
- goodsName: 商品名称
- skuCode:商品 sku 编码
- brandName:商品品牌名称
- channelType:渠道类型
- shopCode: 店铺编码
- publicPrice:售卖价格(基础价,对所有人开放价格)
- closeUserCode:封闭会员编码
- groupPrice:分组价格,其中使用嵌套类型存储,包括: 分组价格、 分组级别
定义商品 Mapping
PUT my_goods
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"goodsName": {
"type": "text",
"analyzer": "ik_smart"
},
"skuCode": {
"type": "keyword"
},
"brandName": {
"type": "keyword"
},
"channelType": {
"type": "keyword"
},
"shopCode": {
"type": "keyword"
},
"publicPrice": {
"type": "float"
},
"closeUserCode": {
"type": "text",
"analyzer": "standard"
},
"boostValue": {
"type": "keyword"
},
"groupPrice": {
"type": "nested",
"properties": {
"boxLevelPrice": {
"type": "float"
},
"level": {
"type": "text"
}
}
}
}
}
}
Document APIs
主要涉及以下几个核心功能
Index
对文档的新增操作支持以下类型
PUT /<target>/_doc/<_id>
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>
以 POST //_create/<_id>为例,以下将创建文档 ID 为 1 的商品信息
POST /my_goods/_create/1
{
"goodsName":"苹果 51英寸 4K超高清",
"skuCode":"skuCode1",
"brandName":"苹果",
"closeUserCode":[
"0"
],
"channelType":"cloudPlatform",
"shopCode":"sc00001",
"publicPrice":"8188.88",
"groupPrice":null,
"boxPrice":null,
"boostValue":1.8
}
Bulk
Elasticsearch 支持批量插入,_bulk 批量导入
POST my_goods/_bulk
{"index":{"_id":1}}
{"goodsName":"苹果 51英寸 4K超高清","skuCode":"skuCode1","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.8}
{"index":{"_id":2}}
{"goodsName":"苹果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":"6188.88","groupPrice":null,"boxPrice":null,"boostValue":1.0}
{"index":{"_id":3}}
{"goodsName":"苹果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山东苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}
{"index":{"_id":5}}
{"goodsName":"苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清","skuCode":"skuCode5","brandName":"三星苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2500"},{"level":"level2","boxLevelPrice":"3500"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":3588.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5588.88}],"boostValue":1.2}
{"index":{"_id":6}}
{"goodsName":"三星UA55RU7520JXXZ 51英寸 4K超高清","skuCode":"skuCode1","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.2}
{"index":{"_id":7}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd002"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":8}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":9}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":10}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.8}
Delete
对文档的删除操作支持以下类型
DELETE /<index>/_doc/<_id>
删除文档 ID 为 2 的数据:
DELETE /my_goods/_doc/2
Delete by query
另外,删除操作支持带多种条件的删除,可以使用 _delete_by_query
如下操纵,将删除店铺编码为 sc00002 的所有商品。
POST /my_goods/_delete_by_query
{
"query": {
"match": {
"shopCode": "sc00002"
}
}
}
Update
对文档的修改操作支持以下类型
POST /<index>/_update/<_id>
修改文档 ID 为1的文档信息
新增字段
POST /my_goods/_update/1
{
"doc": {
"shopName": "小王店铺"
}
}
修改店铺名称为:“张三店铺”
POST /my_goods/_update/1
{
"doc": {
"shopName": "张三店铺"
}
}
{
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
另外还可以使用 PUT 进行修改,只不过需要罗列所有字段
PUT my_goods/_doc/10
{
"goodsName": "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode": "skuCode10",
"brandName": "三星",
"closeUserCode": [
"uc0022"
],
"channelType": "cloudPlatform",
"shopCode": "sc00001",
"publicPrice": "8288.88",
"groupPrice": null,
"boxPrice": [
{
"boxType": "box1",
"boxUserCode": [
"uc0022"
],
"boxPriceDetail": 4288.88
}
],
"boostValue": 1.8
}
用脚本同样能实现更新操作
POST my_goods/_update/10
{
"script": {
"source": "ctx._source.city=params.channelType",
"lang": "painless",
"params": {
"channelType": "cloudPlatform1"
}
}
}
Update by query
更新操作还可以使用 _update_by_query API,当店铺编码为 sc00002 时修改 publicPrice 为 5888.00 元。
插入文档 ID 为 2 的店铺商品信息
POST /my_goods/_create/2
{
"goodsName": "苹果 55英寸 3K超高清",
"skuCode": "skuCode2",
"brandName": "苹果",
"closeUserCode": [
"0"
],
"channelType": "cloudPlatform",
"shopCode": "sc00002",
"publicPrice": "6188.88",
"groupPrice": null,
"boxPrice": null,
"boostValue": 1
}
此时查询返回
{
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00002",
"publicPrice" : "6188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.0
}
更新当店铺编码为 sc00002 时修改 publicPrice 为 5888.00 元
POST /my_goods/_update_by_query
{
"script": {
"source": "ctx._source.publicPrice=5888.00",
"lang": "painless"
},
"query": {
"term": {
"shopCode": "sc00002"
}
}
}
再次查询结果
GET /my_goods/_source/2
{
"shopCode" : "sc00002",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"groupPrice" : null,
"boxPrice" : null,
"channelType" : "cloudPlatform",
"boostValue" : 1.0,
"publicPrice" : 5888.0,
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2"
}
Reindex
当有业务需要重建索引时需要用到 _reindex API。
索引的来源和目的地,必须是已经存在的 index、index alias 或者 data stream。
你可以简单的将索引 A reindex 到索引 B,当然也可以带条件的 reindex 到索引 B。
如下所示,将 skuCode=skuCode2 的商品信息 reindex 到索引 my_goods_new 中
POST _reindex
{
"source": {
"index": "my_goods",
"query": {
"match": {
"skuCode": "skuCode2"
}
}
},
"dest": {
"index": "my_goods_new"
}
}
查询 my_goods_new 索引数据
GET my_goods_new/_search/
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_goods_new",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"0"
],
"channelType" : "cmccPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"htd002"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_new",
"_type" : "_doc",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_new",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.2
}
},
{
"_index" : "my_goods_new",
"_type" : "_doc",
"_id" : "10",
"_score" : 1.0,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode2",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.8
}
}
]
}
}
Get
对文档的查询操作支持以下类型
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
查询文档 ID 为 1 的文档信息
GET /my_goods/_doc/1
查询文档 ID 为 1 的文档是否存在,
只判断文档是否存在,head 返回的信息更少、性能更高,满足特殊业务场景使用
HEAD /my_goods/_doc/1
返回
200 - OK
只返回文档信息
查询时只返回 _source 信息
GET /my_goods/_source/1
返回
{
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8
}
定制化返回参数
只获取 _source 部分参数,类似数据库查询中的指定字段,而不是 select * 返回所有字段
#GET 请求模式
GET my_goods/_source/1/?_source_includes=brandName,goodsName
#返回
{
"brandName" : "苹果",
"goodsName" : "苹果 51英寸 4K超高清"
}
#POST body 请求模式
POST my_goods/_search
{
"query": {
"match_all": {
}
},
"fields": ["brandName", "goodsName"],
"_source": false
}
#返回
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"brandName" : [
"苹果"
],
"goodsName" : [
"苹果 55英寸 3K超高清"
]
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"fields" : {
"brandName" : [
"美国苹果"
],
"goodsName" : [
"苹果UA55RU7520JXXZ 53英寸 4K高清"
]
}
},
...
}
查询文档 ID 为 1 的文档是否存在。
只判断文档是否存在 ,Head 返回的信息更少、性能更高,满足特殊业务场景使用
HEAD /my_goods/_doc/1
返回
200 - OK
Mutil get
ES 同时支持批量查询,需要使用 _mget API,查询文档 ID 等于1和2的文档信息
GET /my_goods/_mget
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
返回
{
"docs" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"_seq_no" : 8,
"_primary_term" : 1,
"found" : true,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}
Query DSL
查询索引包括全文本查询、组合查询、结构化查询等。
通常 Search 与 Filter 区别
二者的查询是有区别的:
Query 查询
用于解答文档是否存在,并且告知返回文档与查询条件的匹配度,返回 _score 评分供用户选择。
Filter 查询
只用于返回文档是否与查询匹配,但是不会告诉你匹配度,即不进行评分。在做聚合查询时,filter 经常发挥更大的作用。因为没有评分 Elasticsearch 的处理速度就会提高,提升了整体响应时间。同时 filter 可以缓存查询结果,而 Query 则不能缓存。
使用场景
如果涉及到全文检索以及评分相关业务使用 Query,其他场景推荐使用 Filter 查询。
组合查询
Boolean 查询
Boolean 查询包含 must、filter、must_not。
must :必须匹配并且返回评分,filter 忽略评分,should 相当于数据库查询中的 or,针对 should 有一个特殊的情况,也就是所有的搜索只有 should ,那么必须满足 should 里的其中一个才会被搜索到。must_not 为不匹配,相当于不等于。
查询:店铺编码=sc00001 且渠道 channelType=cloudPlatform 且 publicPrice 价格区间不在 8288-8888 之间,或者品牌包含"果"。首先以下条件必须全部满足:
- 店铺编码=sc00001
- 渠道 channelType=cloudPlatform
- publicPrice 价格区间不在 8288-8888 之间
另外,由于还有 should 查询,满足品牌中包含“果”的也会被查询出来,另外匹配成功后的评分也会提高,相应的结果也会排在前面:
- 品牌包含"果"
2 者取并集的结果作为最终结果返回
POST /my_goods/_search
{
"query": {
"bool": {
"must": {
"term":{
"shopCode":"sc00001"
}
},
"filter": {
"term": {
"channelType": "cloudPlatform"
}
},
"must_not": [
{
"range": {
"publicPrice": {
"gte": 8288,
"lte": 8888
}
}
}
],
"should": [
{
"term": {
"brandName": {
"value": "果"
}
}
}
],
"minimum_should_match" : 1
}
}
}
minimum_should_match 为最小匹配数量,如果 bool 查询包含至少一个 should 子句,并且没有 must 或 filter 子句,则默认值为 1,否则,默认值为 0。举例说明:
POST /my_goods/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"brandName": {
"value": "东"
}
}
},
{
"term": {
"brandName": {
"value": "果"
}
}
}
],
"minimum_should_match" : 1
}
}
}
以上查询表示 brandName 包含“东” 和 “果” 至少匹配成功一次,查询结果如下:
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.5678144,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "山东苹果",
"closeUserCode" : [
"uc001",
"uc002",
"uc003"
],
"skuCode_brandName" : "skuCode4山东苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 16977.76,
"goodsName_length" : 31,
"groupPrice" : [
{
"level" : "level1",
"boxLevelPrice" : "2488.88"
},
{
"level" : "level2",
"boxLevelPrice" : "3488.88"
}
],
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc004",
"uc005",
"uc006",
"uc001"
],
"boxPriceDetail" : 4488.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"htd007",
"htd008",
"htd009",
"uc0010"
],
"boxPriceDetail" : 5488.88
}
],
"boostValue" : 1.2,
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"skuCode" : "skuCode4"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.2792403,
"_source" : {
"shopCode" : "sc00002",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"skuCode_brandName" : "skuCode2苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 12377.76,
"goodsName_length" : 13,
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.0,
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2792403,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"skuCode_brandName" : "skuCode1苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 32755.52,
"goodsName_length" : 13,
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.21222264,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "美国苹果",
"closeUserCode" : [
"0"
],
"skuCode_brandName" : "skuCode3美国苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 16777.76,
"goodsName_length" : 26,
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"htd003",
"uc004"
],
"boxPriceDetail" : 4388.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"uc005",
"uc0010"
],
"boxPriceDetail" : 5388.88
}
],
"boostValue" : 1.2,
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
"skuCode" : "skuCode3"
}
},
...
]
当我们调整 minimum_should_match 为 2 时观察结果返回:
POST /my_goods/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"brandName": {
"value": "东"
}
}
},
{
"term": {
"brandName": {
"value": "果"
}
}
}
],
"minimum_should_match" : 2
}
}
}
#返回:
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.5678144,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "山东苹果",
"closeUserCode" : [
"uc001",
"uc002",
"uc003"
],
"skuCode_brandName" : "skuCode4山东苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 16977.76,
"goodsName_length" : 31,
"groupPrice" : [
{
"level" : "level1",
"boxLevelPrice" : "2488.88"
},
{
"level" : "level2",
"boxLevelPrice" : "3488.88"
}
],
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc004",
"uc005",
"uc006",
"uc001"
],
"boxPriceDetail" : 4488.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"htd007",
"htd008",
"htd009",
"uc0010"
],
"boxPriceDetail" : 5488.88
}
],
"boostValue" : 1.2,
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"skuCode" : "skuCode4"
}
}
]
可以看到,只有 goodsName 出现 “东” 和 “果” 2 次以及 2 次以上的结果被查询到。
Boosting 查询
Boosting 用于控制评分相关度相关,可以提升评分也可以降低评分。
可以看到 2 条文档记录评分一致:"_score" : 1.3862942 ,
当我们修改 negative_boost: 0.2 时,此时返回(省略部分无关字段)
POST /my_goods/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"skuCode": {
"value": "skuCode1"
}
}
},
"negative": {
"term": {
"goodsName": {
"value": "三星"
}
}
},
"negative_boost": 0.2
}
}
}
#返回
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.3862942,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8,
"shopName" : "张三店铺"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.27725884,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "三星",
"closeUserCode" : [
"0"
],
"channelType" : "cmccPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8188.88",
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.2
}
}
]
此时发现文档 ID=6 的评分下降到 _score : 0.27725884,因为在 negative 命中了查询条件,negative_boost 在 0 到 1 之间时,用于降低评分,相反,大于 1 用于提升评分。
Constant score query 查询
当查询不关心 TF(词频)时,就可以使用 constant score query 。
POST /my_goods/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"goodsName": "苹果"
}
},
"boost": 1.2
}
}
}
返回(省略部分无关字段)
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.2,
"_source" : {
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.2,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清"
}
}
}
可以看到,文档 ID =3 的评分和文档 ID =4 的评分一样,但是 ID=4 的匹配相关度更高,这是由于我们忽略了词频对打分的影响。
Disjunction max query 查询
Disjunction 查询也被理解为分离最大化查询,指的是将任何与任一查询匹配的文档,作为结果返回,但只将最佳匹配的评分,作为查询的评分结果返回。
例如查询商品名称和品牌名称中包含“苹果”的信息,当品牌的评分高于商品名称时,则返回品牌的评分做为总评分(忽略tie_breaker缓冲)。
GET /my_goods/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"term": {
"goodsName": {
"value": "苹果"
}
}
},
{
"term": {
"brandName": {
"value": "苹果"
}
}
}
]
}
}
}
返回结果(忽略无关字段)
"max_score" : 3.0150018,
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0150018,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"brandName" : "苹果"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3465583,
"_source" : {
"goodsName" : "苹果UA55R苹果U7苹果520JXXZ 55英寸 5K超高清",
"brandName" : "三星苹果"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.2337791,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"brandName" : "山东苹果"
}
},
分析:
- ID=1 的记录,由于品牌只包含“苹果” 2 字,Elasticsearch 认为这种匹配度更高,所以此条记录评分排在第一位。
- ID=5 的记录,由于品牌中和 ID =4 的记录都包含苹果且字数一样,此时就要看 goodsName 包含苹果的词频数量了,ID=5 的品牌中,“苹果”出现了 3 次,而 ID=4 的值出现了 2 次,所以评分没有 ID=5 的高,符合我们的预期结果。
- tie_breaker 字段做什么用呢?它是起到了缓冲的作用(取值范围:0 到 1 之间),Disjunction 查询会将匹配度最高的字段得分,做为整个文档的得分返回,这种情况其他字段就不起作用了,难免有点走极端。此时就需要 tie_breaker 来做缓存,提升其他字段的影响力,最终的结果:brandName 评分+ goodsName 评分 *tie_breaker,作为总评分返回。
Function score query 查询
Function score 允许你控制查询评分,是用来控制评分过程的终极武器。最高效的用法是用过滤器对结果的子集应用不同的函数,同时运用了 filter 的缓存,并且达到控制评分的过程。
我们想让山东的苹果搜索出现在美国苹果之前,查询商品名称包含“苹果”,当品牌中包含“美国”时,权重设置为 2,当出现“山东”时,权重设置为 40 。
GET /my_goods/_search
{
"query": {
"function_score": {
"query": {
"term": {
"goodsName": {
"value": "苹果"
}
}
},
"boost": 2,
"functions": [
{
"filter": {
"match":{
"brandName":"美国"
}
},
"random_score": {
},
"weight": 2
},
{
"filter": {
"match":{
"brandName":"山东"
}
},
"weight": 40
}
],
"max_boost": 60,
"score_mode": "max",
"boost_mode": "multiply",
"min_score": 2
}
}
}
返回主要信息
"max_score" : 2.2442641,
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 2.0562985,
"_source" : {
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"brandName" : "山东苹果"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.7582327,
"_source" : {
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
"brandName" : "美国苹果",
}
}
]
解释几个参数:
- score_modemultiply: 默认,分数相乘
- avg:平均分数,第一个 function 的分数
- max:使用评分最大的分数
- min:使用评分最小的分数 avg
举例,如果 2 个函数返回的分数为 1 和 2,并且它们的权重分别为 3 和 4,则他们的评分为:(13+24)/(3+4)
其他详解请参考官方score-functions详解:
Full text 全文本查询
Match 查询
Match 查询是一种标准的查询,示例如下
# 通过 highlight 对查询到的结果进行高亮显示
GET /my_goods/_search
{
"query": {
"match": {
"goodsName": "苹果 高清 英寸"
}
},
"highlight": {
"fields": {
"goodsName": {
"pre_tags": [
"<strong>"
],
"post_tags": [
"</strong>"
]
}
}
}
}
Match 查询是一种 boolean 类型的查询,可以使用 operator 来控制 boolean 字句,operator 包含 and 和 or (默认为 or)。
GET /my_goods/_search
{
"query": {
"match": {
"goodsName": {
"query": "苹果 高清 英寸",
"operator": "and"
}
}
}
}
#返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
命中为 0,因为没有标题中包含 “苹果 高清 英寸” 词组的商品信息,这里的 and 是将查询条件做分词处理,然后查询结果时,必须全部包含 “苹果 高清 英寸” 分词词组才能被检索,下面再演示下 or 的例子:
GET /my_goods/_search
{
"query": {
"match": {
"goodsName": {
"query": "苹果 高清 英寸",
"operator": "or"
}
}
}
}
#返回
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.836855,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "山东苹果",
"closeUserCode" : [
"uc001",
"uc002",
"uc003"
],
"skuCode_brandName" : "skuCode4山东苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 16977.76,
"goodsName_length" : 31,
"groupPrice" : [
{
"level" : "level1",
"boxLevelPrice" : "2488.88"
},
{
"level" : "level2",
"boxLevelPrice" : "3488.88"
}
],
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc004",
"uc005",
"uc006",
"uc001"
],
"boxPriceDetail" : 4488.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"htd007",
"htd008",
"htd009",
"uc0010"
],
"boxPriceDetail" : 5488.88
}
],
"boostValue" : 1.2,
"goodsName" : "山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清",
"skuCode" : "skuCode4"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "10",
"_score" : 0.9227071,
"_source" : {
"goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
"skuCode" : "skuCode10",
"brandName" : "三星",
"closeUserCode" : [
"uc0022"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8288.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"uc0022"
],
"boxPriceDetail" : 4288.88
}
],
"boostValue" : 1.8,
"city" : "cloudPlatform1"
}
}
可以看到,“三星 UA55RU7520JXXZ 52 英寸 4K 超高清” 由于包含 “高清” 所以能被查询到。
Match phrase query
用于匹配索引中是否存在所输入的查询条件数据
GET /my_goods/_search
{
"query": {
"match_phrase": {
"goodsName": "apple"
}
}
}
比较 match_phrase 与 match 区别
match_phrase
将查询条件的中的信息看做一个整体,如下面的 “goods t” 必须 goods 在前 t 在后。
match
将查询中的条件做分词处理后,再去做查询。
#查询不到任何数据,因为不存在'goods t'的词组
GET /my_goods/_search
{
"query": {
"match_phrase": {
"goodsName": "goods t"
}
}
}
#能查询到数据,因为文档中包含goods和t的词组
GET /my_goods/_search
{
"query": {
"match": {
"goodsName": "goods t"
}
}
}
在 match_phrase 中,可以通过 slop 来控制单词中间的间隔,默认为 0,下面举例说明
GET /my_goods/_search
{
"query": {
"match_phrase": {
"goodsName": {
"query": "apple test",
"slop": 1
}
}
}
}
#返回
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.08089,
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "21",
"_score" : 3.08089,
"_source" : {
"goodsName" : "apple goods test",
"skuCode" : "skuCode3",
"brandName" : "美国苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : "8388.88",
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"htd003",
"uc004"
],
"boxPriceDetail" : 4388.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"uc005",
"uc0010"
],
"boxPriceDetail" : 5388.88
}
],
"boostValue" : 1.2
}
}
]
}
}
可以看到,我们设置了 1 个词条,apple 与 test 之间间隔 一个词条,故能查询到。
Match phrase prefix query
返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容,且是按照 prefix 来进行匹配的,如 "apple goods test" ,商品名称包含 apple goods test 的数据将被查询到返回。
新增一条测试数据
POST my_goods/_bulk
{"index":{"_id":13}}
{"goodsName":"apple and goods product ","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":21}}
{"goodsName":"apple goods test","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
#只返回goodsName : apple goods test的数据
GET /my_goods/_search
{
"query": {
"match_phrase_prefix": {
"goodsName": "apple goods t"
}
}
}
总结比较 match 这四种查询
| Match | 返回匹配查询条件的文档内容,查询条件会在匹配之前会被分词处理。
|
Match boolean prefix | 是一个 Boolean 查询,将分词后的短语按照 term 进行查询,最后一个词组按照 prefix 查询。 |
Match phrase |
| 将查询条件当做一个词组进行查询,不进行分词处理。
|
| Match phrase prefix
| 返回文档包含给定查询条件的文档,文档中必须包含给定条件的内容且是按照顺序的
,与 match phrase 类似,对最后一个 token 会进行前缀匹配,可以通过 slop 来控制匹配token的位置差。 |
Multi-match
多字段匹配,可以在多个字段中匹配查询相关信息,通过 type 参数可以调整结果集
#查询商品名称和品牌名称中包含苹果的文档信息
POST /my_goods/_search
{
"query": {
"multi_match": {
"query": "苹果",
"type": "best_fields",
"fields": ["goodsName","brandName"],
"tie_breaker": 0.3
}
}
}
type 参数类型详解:
- best_fields :默认,匹配 fields,将评分最高的分数做为整个查询的分数返回;
- most_fields:查询匹配的文档,并且返回各个字段的分数之和的平均值;
- cross_fields:跨字段匹配,匹配多个字段中是否包含查询词组,对每个字段分别进行打分,然后执行 max 运算获取打分最高的;
- phrase:以 match_phrase 方式运行查询,并返回最佳匹配的评分做为总评分;
- phrase_prefix:以 match_phrase_prefix 方式运行查询,并返回最佳匹配的评分做为总评分;
- bool_prefix:在每个字段上运行 match_bool_prefix 查询,并组合每个字段的评分,详情参考 bool_prefix 以 cross_fields 为例进行实战讲解。
#插入测试数据
PUT my_shop
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"firstName":{
"type":"text"
},
"lastName":{
"type":"text"
}
}
}
}
POST my_shop/_bulk
{"index":{"_id":1}}
{"first_name":"Will","last_name":"Smith","age":25}
{"index":{"_id":2}}
{"first_name":"Smith","last_name":"hello","age":21}
{"index":{"_id":3}}
{"first_name":"Will","last_name":"hello","age":20}
#查询姓名为 Will Smith 的信息
GET /my_shop/_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first_name^2", "last_name" ],
"operator": "and"
}
}
}
#返回
"max_score" : 1.9208363,
"hits" : [
{
"_index" : "my_shop",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9208363,
"_source" : {
"first_name" : "Will",
"last_name" : "Smith",
"age" : 25
}
}
]
另外,first_name 提升了权重,默认为1。
Term - level 查询
可以使用 Term - level 查询结构化数据,结构化数据如日期范围、IP 地址、价格等,下面分别演示在业务场景中的实际使用。
Exists 查询
返回包含字段索引值的文档
#返回包含 goodsName 字段的索引文档
GET /my_goods/_search
{
"query": {
"exists": {
"field": "goodsName"
}
}
}
Fuzzy 查询
返回包含与搜索字词相似的字词的文档,可以用于查询纠错功能。
Edit distance 指的是最小编辑距离,指的是两个字符串之间,由一个字符串转换为另外一个字符串,所需要的最少编辑次数,也叫:Levenshtein ,
参考地址:https://en.wikipedia.org/wiki/Levenshtein_distance
一些查询和 APIs 支持参数去做不精准查询操作,此时可以使用 fuzziness 参数
- 0、1、2 表示最大允许可编辑距离
AUTO 根据词项的长度确定可编辑距离数值,有两种可选参数,AUTO:[low] 和 [high],用于分别表示短距离参数与长距离参数,未指定情况下,默认值是 3 和 6
- 0..2 单词长度为 0 到 2个字母之间时,必须要精确匹配
- 3..5 单词长度 3 到 5 个字母时,最大编辑距离为 1
5 单词长度大于 5 个字母时,最大编辑距离为 2
#以官网例子举例说明
POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "text": "Surprise me!"}
{ "index": { "_id": 2 }}
{ "text": "That was surprising."}
{ "index": { "_id": 3 }}
{ "text": "I wasn't surprised."}
GET /my_index/_search
{
"query": {
"fuzzy": {
"text": {
"value": "surprize",
"prefix_length": 1
}
}
}
}
#返回
"hits" : [
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "1",
"_score" : 0.9559981,
"_source" : {
"text" : "Surprise me!"
}
},
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "3",
"_score" : 0.69983494,
"_source" : {
"text" : "I wasn't surprised."
}
}
默认如果不设置,prefix_length 就是 0
- surprising 未被搜索到,原因是默认的 auto 只允许两个编辑错误,因为 surprize 的长度大于 5,确切地说有三个编辑距离(需要有三次编辑),不能纠错。
- surprize 拼写错误,s->z,错误在一个位置,在 2 个位置的纠错范围之内为提高性能,可以设置 max_expansions,将限制产生模糊文档的个数;
- prefix_length 不宜设置过大,也将影响查询性能,同时错误过多,也将导致查询结果不是用户期望的。
fuziness 实际上采用的是 auto,允许有两个编辑距离,假设采用如下的查询,将只有一个结果
GET /my_index/_search
{
"query": {
"fuzzy": {
"text": {
"value": "surprize",
"fuzziness": "1",
"prefix_length": 1
}
}
}
}
#返回:
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9559981,
"hits" : [
{
"_index" : "my_index",
"_type" : "my_type",
"_id" : "1",
"_score" : 0.9559981,
"_source" : {
"text" : "Surprise me!"
}
}
]
}
}
Ids 查询
范围文档包含ID的文档信息
GET /my_goods/_search
{
"query": {
"ids" : {
"values" : ["1", "4", "5"]
}
}
}
Prefix 查询
返回在提供的字段中包含特定前缀的文档
PUT my_shop_test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"shopName":{
"type":"text"
},
"shopCode":{
"type":"text"
}
}
}
}
#添加测试数据
POST my_shop_test/_bulk
{"index":{"_id":1}}
{"shopName":"box","shopCode":"Smith"}
{"index":{"_id":2}}
{"shopName":"black","shopCode":"jack"}
{"index":{"_id":3}}
{"shopName":"fox","shopCode":"act"}
{"index":{"_id":4}}
{"shopName":"booex","shopCode":"act"}
#
GET /my_shop_test/_search
{
"query": {
"prefix": {
"shopName": {
"value": "bo"
}
}
}
}
#返回
"hits" : [
{
"_index" : "my_shop_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"shopName" : "box",
"shopCode" : "Smith"
}
},
{
"_index" : "my_shop_test",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"shopName" : "booex",
"shopCode" : "act"
}
}
]
Range 查询
Range 查询类似数据库中的 大于、小于范围查询
GET my_goods/_search
{
"query": {
"range": {
"publicPrice": {
"gte": 2000,
"lte": 8488
}
}
}
}
- gt:大于
- gte:大于等于
- lt:小于
- lte:小于等于
Regexp 查询
正则表达式查询,查询店铺编码以 's' 开头,中间包括任何字符,以及长度且以'1'结尾的数据
GET my_goods/_search
{
"query": {
"regexp": {
"shopCode": {
"value": "s.*1",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
Term 查询
#返回确切的文档内容,避免对 text 字段类型使用 term
GET my_goods/_search
{
"query": {
"term": {
"brandName": {
"value": "三星",
"boost": 1.0
}
}
}
}
Terms 查询
Terms 返回一个或多个包含精确查询条件的文档信息
GET /my_goods/_search
{
"query": {
"terms": {
"brandName": [ "美国", "三星" ],
"boost": 1.0
}
}
}
Terms_set 查询
返回最小精确匹配成功的文档信息,terms_set 类似 terms 查询,只不过 terms_se 多定义了返回最小匹配的数量。
#新定义商品信息
PUT /my_goods_info
{
"mappings": {
"properties": {
"goodsName": {
"type": "keyword"
},
"sale_property": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}
#添加3条商品测试数据
#销售属性 白色、64G、标品
PUT /my_goods_info/_doc/1?refresh
{
"name": "apple",
"sale_property": [ "white", "64","standard" ],
"required_matches": 2
}
#黑色、32G、非标品
PUT /my_goods_info/_doc/2?refresh
{
"name": "apple",
"sale_property": [ "black", "32","no standard" ],
"required_matches": 2
}
#黑色、64 非标品
PUT /my_goods_info/_doc/3?refresh
{
"name": "apple",
"sale_property": [ "black", "64","no standard" ],
"required_matches": 2
}
#查询
GET /my_goods_info/_search
{
"query": {
"terms_set": {
"sale_property": {
"terms": [ "white", "64"],
"minimum_should_match_field": "required_matches"
}
}
}
}
#返回
"hits" : [
{
"_index" : "my_goods_info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1149836,
"_source" : {
"name" : "apple",
"sale_property" : [
"white",
"64",
"standard"
],
"required_matches" : 2
}
}
]
Wildcard 查询
返回包含与通配符模式匹配的术语的文档
GET /my_goods/_search
{
"query": {
"wildcard": {
"shopCode": {
"value": "sc*1",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
Geo 查询
Elasticsearch 支持两种 geo 数据:geo_point 经纬度 和 geo_shape 点、线、圆、多边形等复杂图形
Geo_point
用于查找距离另一个坐标范围内的所有坐标点,或者计算亮点之间的距离用于排序、打分、聚合等操作。
Geo-shapes
常用于过滤,比如判断两个地理形状是否有重叠或者某个地形是否包含了其他的地理形状
查询分为 4 种类型
- geo_bounding_box:查找具有落入指定矩形的地理位置的坐标点
- geo_distance:查找地理点在中心点指定距离内的坐标点
- geo_polygon:查找具有指定多边形内的地理点的坐标点
-
geo_shape:查找具有以下内容的坐标点:
- geo-shapes 与指定的几何形状相交,包含于其中或不与指定的几何形状相交的坐标点
- geo-points 与指定的地理形状相交的坐标点
过滤器将所有文档载入内存,然后每个过滤器执行计算,判断坐标点是否落在指定区域。可见坐标过滤器的代价较昂贵。
最优的做法是先用简单的过滤器尽可能多的过滤掉文档,然后再交给地理坐标过滤器来处理数据。
Geo-bounding box 查询
定义索引对象店铺信息
PUT /my_shop_info
{
"mappings": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
#添加2条测试数据
PUT /my_shop_info/_doc/1
{
"pin": {
"location": {
"lat": 40.12,
"lon": -71.34
}
}
}
PUT /my_shop_info/_doc/2
{
"pin": {
"location": {
"lat": 50.12,
"lon": -61.34
}
}
}
#查询指定范围内的数据
GET my_shop_info/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"pin.location": {
"top_left": {
"lat": 40.73,
"lon": -74.1
},
"bottom_right": {
"lat": 40.01,
"lon": -71.12
}
}
}
}
}
}
}
#返回
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_shop_info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"pin" : {
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
}
]
}
Geo-distance 查询
查询仅包含距某个地理点特定距离之内的匹配的坐标,如下所示,查询坐标
#仍然以 my_shop_info 为例
GET /my_shop_info/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "200km",
"pin.location": {
"lat": 40,
"lon": -70
}
}
}
}
}
}
创作人简介:
李增胜,Elasticsearch 认证工程师、PMP 项目管理认证,现就职于汇通达网络股份有
限公司,任产业交易平台交易域技术经理,从事微服务架构、搜索架构方向开发与管理
工作。技术关注:电商、产业互联网等领域。
博客:https://www.jianshu.com/u/59dceda66b57