ElasticSearch(五):Mapping和常见字段类型

ElasticSearch(五):Mapping和常见字段类型

学习课程链接《Elasticsearch核心技术与实战》


什么是Mapping

  • Mapping类似数据库中的schema的定义,作用如下:
    • 定义索引中的字段的名称;
    • 定义字段的数据类型,例如字符串、数字、日期、布尔等;
    • 对每个字段进行倒排索引的相关配置(Analyzed or Not Analyzed,Analyzer);
  • Mapping 会把JSON文旦映射成Lucene所需要的扁平格式。
  • 一个Mapping属于一个索引的Type:
    • 每个文档都属于一个Type;
    • 一个Tpye有一个Mapping定义;
    • 7.0开始,不需要再Mapping定义中指定type信息;


字段的数据类型

  • 简单类型
    • Text
    • Date
    • Integer/Long/Floating
    • Boolean
    • IP4&IP6
    • Keyword
  • 复杂类型
    • 对象类型
    • 嵌套类型
  • 特殊类型(地理信息)
    • geo_point&geo_shape、percolator


什么是Dynamic Mapping

  • 在写入文档的时候,如果索引不存在,则会自动创建索引;
  • Dynamic Mapping机制,可以无需手动定义Mapping,ElasticSearch会自动根据文档信息,推算出字段的类型;
  • 但是有时候推算的可能不对,例如地理位置信息;
  • 当类型设置的不对时,会导致一些功能无法正常运行,比如范围内的Range查询;


类型的自动识别

JSON类型 Elasticsearch类型
字符串 匹配日期格式,设置成Date;匹配数字设置成Float或者Long,该选项默认关闭;设置为Text,并且增加keyword子字段
布尔值 Boolean
浮点数 Float
整数 Long
对象 Object
数组 由第一个非空数的类型所决定
空值 忽略


#写入文档,查看 Mapping
PUT mapping_test/_doc/1
{
  "firstName":"Chan",
  "loginDate":"2018-07-24T10:29:48.103Z",
  "uid" : "123",
  "isVip" : false,
  "isAdmin": "true",
  "age":19,
  "heigh":180
}

#Delete index
DELETE mapping_test

#查看 Dynamic Mapping文件
GET mapping_test/_mapping
#查看 Dynamic Mapping返回结果
{
  "mapping_test" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"  # "age":19,设置为long
        },
        "firstName" : {
          "type" : "text",  # "firstName":"Chan",设置为Text,并且增加keyword子字段
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "heigh" : {
          "type" : "long" #"heigh":180设置为long
        },
        "isAdmin" : {
          "type" : "text", #"isAdmin": "true",设置为Text,并且增加keyword子字段
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isVip" : {
          "type" : "boolean" #"isVip" : false,设置为boolean
        },
        "loginDate" : {
          "type" : "date" #"loginDate":"2018-07-24T10:29:48.103Z",设置为Date
        },
        "uid" : {
          "type" : "text", # "uid" : "123",设置为Text,并且增加keyword子字段,匹配数字设置成Float或者Long,该选项默认关闭;
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}


能否更改 Mapping 的字段类型

分两种情况:

  • 新增加字段
    • Dynamic设置为true时,一旦有新增字段的文档写入,Mapping也同时被更新;
    • Dynamic设置为false时,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中;
    • Dynamic设置为strict时,文档写入失败;
  • 对已有字段,一旦已有数据写入,就不在支持修改字段定义
    • Lucene实现的倒排索引,一旦生成后,就不允许修改
    • 如果希望修改字段类型,必须Reindex API,重建索引
    • 如果修改了字段的数据类型,会导致已被索引的数据无法被搜索


控制Dynamic Mappings

dynamic true false strict
文档可索引 YES YES NO
字段可索引 YES NO NO
Mapping被更新 YES NO NO


  • 当dynamic被设置成false时,存在新增字段数据写入,该数据可以被索引,但新增字段被丢弃
  • 当dynamic被设置成strict时,数据写入直接出错
#1.默认Mapping支持dynamic,写入的文档中加入新的字段
PUT dynamic_mapping_test/_doc/1
{
  "newField":"someValue"
}
#2.该字段可以被搜索,数据也在_source中出现
POST dynamic_mapping_test/_search
{
  "query":{
    "match":{
      "newField":"someValue"
    }
  }
}
#返回结果:
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "dynamic_mapping_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "newField" : "someValue"
        }
      }
    ]
  }
}
#3.修改为dynamic false
PUT dynamic_mapping_test/_mapping
{
  "dynamic": false
}
#4.新增 anotherField
PUT dynamic_mapping_test/_doc/10
{
  "anotherField":"someValue"
}
#5.该字段不可以被搜索,因为dynamic已经被设置为false
POST dynamic_mapping_test/_search
{
  "query":{
    "match":{
      "anotherField":"someValue"
    }
  }
}
#返回结果:
{
  "took" : 657,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
#6.修改为strict
PUT dynamic_mapping_test/_mapping
{
  "dynamic": "strict"
}

#7.写入数据出错,HTTP Code 400
PUT dynamic_mapping_test/_doc/12
{
  "lastField":"value"
}
#返回结果:
{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"
  },
  "status": 400
}


如何定义一个 Mapping

PUT  index_name
{
    "mappings":{
        "properties":{
            //define your mappings here
        }
    }
}
  • 可以参考API手册,纯手写;
  • 为了减少输入的工作量,减少出错概率,可以依照以下步骤:
    • 创建一个临时的index,写入一些样本数据;
    • 通过访问Mapping API获取该临时文件的动态Mapping定义;
    • 修改后,使用该配置创建你的索引
    • 删除临时索引


Mapping的一些配置

  • index控制当前字段是否被索引,默认为true。如果设置成false,该字段不可被搜索。
  • index_options可以控制倒排索引记录的内容,有四种不同级别的配置:
    • docs记录 doc id
    • freqs记录 doc id / term frequencies
    • positions记录 doc id / term frequencies / term position
    • offects记录 doc id / term frequencies / term position / character offects
  • Text类型默认记录positions,其他默认为 docs。记录的类容越多,占用存储空间越大。
  • null_value控制需要对Null值实现搜索;只有Keyword类型支持设定null_value。
  • copy_to满足一些特定的搜索需求,copy_to将字段的数值拷贝到目标字段,实现类似_all的作用,_all在ES7中被copy_to所替代,copy_to的目标字段不出现在_source中。
  • Elasticsearch中不提供专门的数组类型。但是任何字段,都可以包含多个相同类类型的数值。
#1.设置 index 为 false
DELETE users
PUT users
{
    "mappings" : {
      "properties" : {
        "firstName" : {
          "type" : "text"
        },
        "lastName" : {
          "type" : "text"
        },
        "mobile" : {
          "type" : "text",
          "index": false
        }
      }
    }
}
#插入数据
PUT users/_doc/1
{
  "firstName":"Ruan",
  "lastName": "Yiming",
  "mobile": "12345678"
}
#查询
POST /users/_search
{
  "query": {
    "match": {
      "mobile":"12345678" #该字段不可被搜索
    }
  }
}
#查询返回结果:
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"match\" : {\n    \"mobile\" : {\n      \"query\" : \"12345678\",\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
        "index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",
        "index": "users"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "users",
        "node": "u-4S1mfbQiuA1Bqe-wfPJQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"match\" : {\n    \"mobile\" : {\n      \"query\" : \"12345678\",\n      \"operator\" : \"OR\",\n      \"prefix_length\" : 0,\n      \"max_expansions\" : 50,\n      \"fuzzy_transpositions\" : true,\n      \"lenient\" : false,\n      \"zero_terms_query\" : \"NONE\",\n      \"auto_generate_synonyms_phrase_query\" : true,\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",
          "index": "users",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot search on field [mobile] since it is not indexed." #错误原因
          }
        }
      }
    ]
  },
  "status": 400
}
#设定Null_value
DELETE users
PUT users
{
    "mappings" : {
      "properties" : {
        "firstName" : {
          "type" : "text"
        },
        "lastName" : {
          "type" : "text"
        },
        "mobile" : {
          "type" : "keyword",
          "null_value": "NULL"
        }

      }
    }
}
#插入数据
PUT users/_doc/1
{
  "firstName":"Ruan",
  "lastName": "Yiming",
  "mobile": null
}
#插入数据
PUT users/_doc/2
{
  "firstName":"Ruan2",
  "lastName": "Yiming2"

}
#查询
GET users/_search
{
  "query": {
    "match": {
      "mobile":"NULL"
    }
  }
}
#查询返回结果:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "users",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "firstName" : "Ruan",
          "lastName" : "Yiming",
          "mobile" : null
        }
      }
    ]
  }
}
#设置 Copy to
DELETE users
PUT users
{
  "mappings": {
    "properties": {
      "firstName":{
        "type": "text",
        "copy_to": "fullName"
      },
      "lastName":{
        "type": "text",
        "copy_to": "fullName"
      }
    }
  }
}
#插入数据
PUT users/_doc/1
{
  "firstName":"Ruan",
  "lastName": "Yiming"
}
#查询方法1
GET users/_search?q=fullName:(Ruan Yiming)
#查询方法2
POST users/_search
{
  "query": {
    "match": {
       "fullName":{
        "query": "Ruan Yiming",
        "operator": "and"
      }
    }
  }
}
#查询返回结果:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "users",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "firstName" : "Ruan",
          "lastName" : "Yiming"
        }
      }
    ]
  }
}
#数组类型
PUT users/_doc/1
{
  "name":"twobirds",
  "interests":["reading","music"]
}
GET users/_mapping
#返回Mapping结果:
{
  "users" : {
    "mappings" : {
      "properties" : {
        "firstName" : {
          "type" : "text",
          "copy_to" : [
            "fullName"
          ]
        },
        "fullName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "interests" : {
          "type" : "text", #数组类型,根据数组里数据类型配置
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastName" : {
          "type" : "text",
          "copy_to" : [
            "fullName"
          ]
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

上一篇:java-休眠:@EmbeddedId,继承和@SecondaryTable


下一篇:spring boot2.1后的版本不打印Mapped日志问题