数据准备
使用edge ngram将每个单词都进行进一步的分词切分,用切分后的ngram来实现前缀搜索推荐功能
//创建索引 PUT my_index { "mappings": { "my_type": { "properties": { "title": { "type": "keyword" } } } } }
//指定ngram相关配置 PUT /my_index { "settings": { "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", //类型 "min_gram": 1, //最小1个字符 "max_gram": 50 //最大分50个字符 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } } }
//创建数据 POST my_index/my_type { "title":"hello win" } POST my_index/my_type { "title":"hello world" } POST my_index/my_type { "title":"hello dog" }
//测试分词器 GET /my_index/_analyze { "analyzer": "autocomplete", "text": "quick brown" }
//为目标字段指定分词器 PUT /my_index/_mapping/my_type { "properties": { "title": { "type": "string", "analyzer": "autocomplete", "search_analyzer": "standard" } } }
//查询 GET my_index/my_type/_search { "query": { "match_phrase": { "title": "hello win" } } }
注意:
搜索原理:
h
he
hel
hello w
hello --> hello,doc1
w --> w,doc1
搜索的时候,不用再根据一个前缀,然后扫描整个倒排索引了; 简单的拿前缀去倒排索引中匹配即可,如果匹配上了,那么就好了; match,全文检索