【ElasticSearch搜索推荐】基于ngram分词机制实现index-time搜索推荐

2023-11-28 11:30:40

数据准备

使用edge ngram将每个单词都进行进一步的分词切分，用切分后的ngram来实现前缀搜索推荐功能

//创建索引
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "keyword"
        }
      }
    }
  }
}

//指定ngram相关配置
PUT /my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",  //类型
                    "min_gram": 1,   //最小1个字符
                    "max_gram": 50 //最大分50个字符
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

//创建数据
POST my_index/my_type
{
  "title":"hello win"
}
POST my_index/my_type
{
  "title":"hello world"
}
POST my_index/my_type
{
  "title":"hello dog"
}

//测试分词器
GET /my_index/_analyze
{
  "analyzer": "autocomplete",
  "text": "quick brown"
}

//为目标字段指定分词器
PUT /my_index/_mapping/my_type
{
  "properties": {
      "title": {
          "type":     "string",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
      }
  }
}

//查询
GET my_index/my_type/_search
{
  "query": {
    "match_phrase": {
      "title": "hello win"
    }
  }
}

注意：

搜索原理：

h
he
hel

hello w

hello --> hello，doc1
w --> w，doc1

搜索的时候，不用再根据一个前缀，然后扫描整个倒排索引了; 简单的拿前缀去倒排索引中匹配即可，如果匹配上了，那么就好了; match，全文检索

码农公寓

相关文章