一、分析器 analyzer
包括
1.字符过滤器 character filter
比如去除HTML标记,或者转化为“&”为“and”
2.分词器 tokenizer
比如按空格分词
3.词单元过滤器 token filter
如大小写转换,去掉停用词,增加同义词
二、内置分析器
标准分析器
根据单词边界分词,去标点符号,转小写
GET _analyze { "analyzer": "standard", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
简单分析器
根据非字母切分,非字母去除,转小写
GET _analyze { "analyzer": "simple", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
空格分析器
依据空格切分,不转换小写
GET _analyze { "analyzer": "whitespace", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
正则分析器
默认为非字符符号(\w+)分隔,转小写
GET _analyze { "analyzer": "pattern", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
233