ElasticSearch(五):Mapping和常见字段类型
学习课程链接《Elasticsearch核心技术与实战》
什么是Mapping
- Mapping类似数据库中的schema的定义,作用如下:
- 定义索引中的字段的名称;
- 定义字段的数据类型,例如字符串、数字、日期、布尔等;
- 对每个字段进行倒排索引的相关配置(Analyzed or Not Analyzed,Analyzer);
- Mapping 会把JSON文旦映射成Lucene所需要的扁平格式。
- 一个Mapping属于一个索引的Type:
- 每个文档都属于一个Type;
- 一个Tpye有一个Mapping定义;
- 7.0开始,不需要再Mapping定义中指定type信息;
字段的数据类型
- 简单类型
- Text
- Date
- Integer/Long/Floating
- Boolean
- IP4&IP6
- Keyword
- 复杂类型
- 特殊类型(地理信息)
- geo_point&geo_shape、percolator
什么是Dynamic Mapping
- 在写入文档的时候,如果索引不存在,则会自动创建索引;
- Dynamic Mapping机制,可以无需手动定义Mapping,ElasticSearch会自动根据文档信息,推算出字段的类型;
- 但是有时候推算的可能不对,例如地理位置信息;
- 当类型设置的不对时,会导致一些功能无法正常运行,比如范围内的Range查询;
类型的自动识别
字符串 |
匹配日期格式,设置成Date;匹配数字设置成Float或者Long,该选项默认关闭;设置为Text,并且增加keyword子字段 |
布尔值 |
Boolean |
浮点数 |
Float |
整数 |
Long |
对象 |
Object |
数组 |
由第一个非空数的类型所决定 |
空值 |
忽略 |
#写入文档,查看 Mapping
PUT mapping_test/_doc/1
{
"firstName":"Chan",
"loginDate":"2018-07-24T10:29:48.103Z",
"uid" : "123",
"isVip" : false,
"isAdmin": "true",
"age":19,
"heigh":180
}
#Delete index
DELETE mapping_test
#查看 Dynamic Mapping文件
GET mapping_test/_mapping
#查看 Dynamic Mapping返回结果
{
"mapping_test" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long" # "age":19,设置为long
},
"firstName" : {
"type" : "text", # "firstName":"Chan",设置为Text,并且增加keyword子字段
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"heigh" : {
"type" : "long" #"heigh":180设置为long
},
"isAdmin" : {
"type" : "text", #"isAdmin": "true",设置为Text,并且增加keyword子字段
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"isVip" : {
"type" : "boolean" #"isVip" : false,设置为boolean
},
"loginDate" : {
"type" : "date" #"loginDate":"2018-07-24T10:29:48.103Z",设置为Date
},
"uid" : {
"type" : "text", # "uid" : "123",设置为Text,并且增加keyword子字段,匹配数字设置成Float或者Long,该选项默认关闭;
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
能否更改 Mapping 的字段类型
分两种情况:
- 新增加字段
- Dynamic设置为true时,一旦有新增字段的文档写入,Mapping也同时被更新;
- Dynamic设置为false时,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中;
- Dynamic设置为strict时,文档写入失败;
- 对已有字段,一旦已有数据写入,就不在支持修改字段定义
- Lucene实现的倒排索引,一旦生成后,就不允许修改
- 如果希望修改字段类型,必须Reindex API,重建索引
- 如果修改了字段的数据类型,会导致已被索引的数据无法被搜索
控制Dynamic Mappings
文档可索引 |
YES |
YES |
NO |
字段可索引 |
YES |
NO |
NO |
Mapping被更新 |
YES |
NO |
NO |
- 当dynamic被设置成false时,存在新增字段数据写入,该数据可以被索引,但新增字段被丢弃
- 当dynamic被设置成strict时,数据写入直接出错
#1.默认Mapping支持dynamic,写入的文档中加入新的字段
PUT dynamic_mapping_test/_doc/1
{
"newField":"someValue"
}
#2.该字段可以被搜索,数据也在_source中出现
POST dynamic_mapping_test/_search
{
"query":{
"match":{
"newField":"someValue"
}
}
}
#返回结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "dynamic_mapping_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"newField" : "someValue"
}
}
]
}
}
#3.修改为dynamic false
PUT dynamic_mapping_test/_mapping
{
"dynamic": false
}
#4.新增 anotherField
PUT dynamic_mapping_test/_doc/10
{
"anotherField":"someValue"
}
#5.该字段不可以被搜索,因为dynamic已经被设置为false
POST dynamic_mapping_test/_search
{
"query":{
"match":{
"anotherField":"someValue"
}
}
}
#返回结果:
{
"took" : 657,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
#6.修改为strict
PUT dynamic_mapping_test/_mapping
{
"dynamic": "strict"
}
#7.写入数据出错,HTTP Code 400
PUT dynamic_mapping_test/_doc/12
{
"lastField":"value"
}
#返回结果:
{
"error": {
"root_cause": [
{
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"
}
],
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"
},
"status": 400
}
如何定义一个 Mapping
PUT index_name
{
"mappings":{
"properties":{
//define your mappings here
}
}
}
- 可以参考API手册,纯手写;
- 为了减少输入的工作量,减少出错概率,可以依照以下步骤:
- 创建一个临时的index,写入一些样本数据;
- 通过访问Mapping API获取该临时文件的动态Mapping定义;
- 修改后,使用该配置创建你的索引
- 删除临时索引
Mapping的一些配置
-
index
控制当前字段是否被索引,默认为true
。如果设置成false
,该字段不可被搜索。
-
index_options
可以控制倒排索引记录的内容,有四种不同级别的配置:
-
docs
记录 doc id
-
freqs
记录 doc id / term frequencies
-
positions
记录 doc id / term frequencies / term position
-
offects
记录 doc id / term frequencies / term position / character offects
- Text类型默认记录
positions
,其他默认为 docs
。记录的类容越多,占用存储空间越大。
-
null_value
控制需要对Null值实现搜索;只有Keyword类型支持设定null_value。
-
copy_to
满足一些特定的搜索需求,copy_to
将字段的数值拷贝到目标字段,实现类似_all
的作用,_all
在ES7中被copy_to
所替代,copy_to
的目标字段不出现在_source中。
- Elasticsearch中不提供专门的数组类型。但是任何字段,都可以包含多个相同类类型的数值。
#1.设置 index 为 false
DELETE users
PUT users
{
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text"
},
"lastName" : {
"type" : "text"
},
"mobile" : {
"type" : "text",
"index": false
}
}
}
}
#插入数据
PUT users/_doc/1
{
"firstName":"Ruan",
"lastName": "Yiming",
"mobile": "12345678"
}
#查询
POST /users/_search
{
"query": {
"match": {
"mobile":"12345678" #该字段不可被搜索
}
}
}
#查询返回结果:
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"match\" : {\n \"mobile\" : {\n \"query\" : \"12345678\",\n \"operator\" : \"OR\",\n \"prefix_length\" : 0,\n \"max_expansions\" : 50,\n \"fuzzy_transpositions\" : true,\n \"lenient\" : false,\n \"zero_terms_query\" : \"NONE\",\n \"auto_generate_synonyms_phrase_query\" : true,\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",
"index": "users"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "users",
"node": "u-4S1mfbQiuA1Bqe-wfPJQ",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"match\" : {\n \"mobile\" : {\n \"query\" : \"12345678\",\n \"operator\" : \"OR\",\n \"prefix_length\" : 0,\n \"max_expansions\" : 50,\n \"fuzzy_transpositions\" : true,\n \"lenient\" : false,\n \"zero_terms_query\" : \"NONE\",\n \"auto_generate_synonyms_phrase_query\" : true,\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",
"index": "users",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot search on field [mobile] since it is not indexed." #错误原因
}
}
}
]
},
"status": 400
}
#设定Null_value
DELETE users
PUT users
{
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text"
},
"lastName" : {
"type" : "text"
},
"mobile" : {
"type" : "keyword",
"null_value": "NULL"
}
}
}
}
#插入数据
PUT users/_doc/1
{
"firstName":"Ruan",
"lastName": "Yiming",
"mobile": null
}
#插入数据
PUT users/_doc/2
{
"firstName":"Ruan2",
"lastName": "Yiming2"
}
#查询
GET users/_search
{
"query": {
"match": {
"mobile":"NULL"
}
}
}
#查询返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"firstName" : "Ruan",
"lastName" : "Yiming",
"mobile" : null
}
}
]
}
}
#设置 Copy to
DELETE users
PUT users
{
"mappings": {
"properties": {
"firstName":{
"type": "text",
"copy_to": "fullName"
},
"lastName":{
"type": "text",
"copy_to": "fullName"
}
}
}
}
#插入数据
PUT users/_doc/1
{
"firstName":"Ruan",
"lastName": "Yiming"
}
#查询方法1
GET users/_search?q=fullName:(Ruan Yiming)
#查询方法2
POST users/_search
{
"query": {
"match": {
"fullName":{
"query": "Ruan Yiming",
"operator": "and"
}
}
}
}
#查询返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"firstName" : "Ruan",
"lastName" : "Yiming"
}
}
]
}
}
#数组类型
PUT users/_doc/1
{
"name":"twobirds",
"interests":["reading","music"]
}
GET users/_mapping
#返回Mapping结果:
{
"users" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text",
"copy_to" : [
"fullName"
]
},
"fullName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"interests" : {
"type" : "text", #数组类型,根据数组里数据类型配置
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"copy_to" : [
"fullName"
]
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}