Elasticsearch 查询语言(Query DSL)认识(一)
一、基本认识
查询子句的行为取决于
- query context
- filter context
也就是执行的是查询(query)还是过滤(filter)
query context 描述的是:被搜索的文档和查询子句的匹配程度
filter context 描述的是: 被搜索的文档和查询子句是否匹配
一个是匹配程度问题,一个是是否匹配的问题
二、实例
- 导入数据 bank account data download
- 将数据导入到elasticsearch
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'
这里有两个地方需要注意,1.host要改成符合自己的。2.早期版本中下载的数据可以能是'accounts.json?raw=true'
大概如下 curl -XPOST 'wbelk:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json?raw=true"
- 参数认识
为了便捷操作,可以安装一个kiabna sense
$./bin/kibana plugin --install elastic/sense
$./bin/kibana
sudo -i service restart kibana(或者用这个启动kibana)
match_all 搜索,直接返回所有文档
GET /bank/_search
{
"query": {
"match_all": {}
}
}
返回大致如下:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "25",
"_score": 1,
"_source": {
"account_number": 25,
"balance": 40540,
"firstname": "Virginia",
"lastname": "Ayala",
"age": 39,
"gender": "F",
"address": "171 Putnam Avenue",
"employer": "Filodyne",
"email": "virginiaayala@filodyne.com",
"city": "Nicholson",
"state": "PA"
}
},
参数大致解释:
- took: 执行搜索耗时,毫秒为单位,也就是本文我1ms
- time_out: 搜索是否超时
- _shards: 多少分片被搜索,成功多少,失败多少
- hits: 搜索结果展示
- hits.total: 匹配条件的文档总数
- hits.hits: 返回结果展示,默认返回十个
- hits.max_score:最大匹配得分
- hits._score: 返回文档的匹配得分(得分越高,匹配程度越高,越靠前)
- _index _type _id 作为剥层定位到特定的文档
- _source 文档源
- 查询语言之 执行查询
- 只显示account_number 和 balance
POST /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "25",
"_score": 1,
"_source": {
"account_number": 25,
"balance": 40540
}
},
{
"_index": "bank",
"_type": "account",
"_id": "44",
"_score": 1,
"_source": {
"account_number": 44,
"balance": 34487
}
},
{
"_index": "bank",
"_type": "account",
"_id": "99",
"_score": 1,
"_source": {
"account_number": 99,
"balance": 47159
}
},
- 返回accountu_number 为20的document
POST /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 5.6587105,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 5.6587105,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place",
"employer": "Scentric",
"email": "elinorratliff@scentric.com",
"city": "Ribera",
"state": "WA"
}
}
]
}
}
- 返回地址中包含(term)mill的所有账户
POST /bank/_search
{
"query": { "match": { "address": "mill" } }
}
- 返回地址中包含term 'mill'或者 'lane'的所有账户
POST /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
- 匹配phrase 'mill lane'
POST /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
- 返回address包含'mill'和'lane'的所有账户 (AND)
POST /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 返回address包含'mill'或'lane'的所有账户 (OR)
POST /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 返回address既不包含'mill'也不包含'lane'的所有账户 (NO)
POST /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
- 返回age为40,并且state不是ID的所有账户 (组合)
POST /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
- 查询语言之 执行过滤
过滤不会进行相关度得分的计算
- 在所有账户中寻找balance 在29900到30000之间(闭区间)的所有账户
(先查询到所有的账户,然后进行过滤)
POST /bank/_search
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 29900,
"lte": 30000
}
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "243",
"_score": 1,
"_source": {
"account_number": 243,
"balance": 29902,
"firstname": "Evangelina",
"lastname": "Perez",
"age": 20,
"gender": "M",
"address": "787 Joval Court",
"employer": "Keengen",
"email": "evangelinaperez@keengen.com",
"city": "Mulberry",
"state": "SD"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "781",
"_score": 1,
"_source": {
"account_number": 781,
"balance": 29961,
"firstname": "Sanford",
"lastname": "Mullen",
"age": 26,
"gender": "F",
"address": "879 Dover Street",
"employer": "Zanity",
"email": "sanfordmullen@zanity.com",
"city": "Martinez",
"state": "TX"
}
},
...
根据返回结果我们可以看到filter得到的_score为1.不存在程度上的问题。是0和1的问题
三、query和filter效率
一般认为filter的速度快于query的速度
- filter不会计算相关度得分,效率高
- filter的结果可以缓存到内存中,方便再用