8-

一、Reindex重建索引

1 POST _reindex
2 {
3   "source": {
4     "index": "ecgdata"
5   },
6   "dest": {
7     "index": "ecg"
8   }
9 }

二、index template

 1 #创建索引模板my_index_template定义mappings
 2 #要求:
 3 #只要名字以index_开头的index,都用这个mappings
 4 #设置主分片3,副本分片1
 5 #int_开头的字段都是用integer类型
 6 #字符串类型的字段都使用keyword并且使用english分词器
 7 PUT _index_template/template_1
 8 {
 9   "index_patterns": [
10     "index_*",
11     "bar*"
12   ],
13   "template": {
14     "settings": {
15       "number_of_shards": 3,
16       "number_of_replicas": 1
17     },
18     "mappings": {
19       "dynamic_templates": [
20         {
21           "integers": {
22             "match_mapping_type": "string",
23             "mapping": {
24               "type": "keyword",
25               "analyzer": "english"
26             }
27           }
28         },
29         {
30           "longs_as_strings": {
31             "match": "int_*",
32             "mapping": {
33               "type": "integer"
34             }
35           }
36         }
37       ]
38     }
39   }
40 }

三、Component template

组件模板,相当于更细粒度的索引模板。一个索引模板可以由多个组件模板组成

 1 PUT _component_template/mappings
 2 {
 3   "template": {
 4     "settings": {
 5       "number_of_shards": 3,
 6       "number_of_replicas": 1
 7     }
 8   }
 9 }
10 
11 PUT _index_template/tem
12 {
13   "index_patterns": ["te*","bar*"],
14   "priority":500,
15   "composed_of":["mappings"]
16 }

四、Update_by_query

是在原数据上进行操作,若考试的时候不确定是否操作正确则先reindex,在新索引上update_by_query结果正确再到考试索引中操作

 1 POST ecgdata/_update_by_query
 2 {
 3   "script": {
 4     "source": "ctx._source.price-=1000",
 5     "lang": "painless"
 6   },
 7   "query": {
 8     "term": {
 9       "price": "4999"
10     }
11   }
12 }

五、Pipeline

考试中如果要使用pipeline,就要在elasticsearch.yml文件中配置ingest角色(预处理)

1、pipeline+reindex

 1 #索引index_pipeline中,有一个数组对象tags,各个数组的元素中,有些前面又空格,有些后面有空格,通过定义ingest pipeline,将这些数组元素中的空格给去掉。另外要求通过reindex将index_pipeline这个索引,应用于刚刚定义好的pipeline,从而生成一个新的索引index_pipeline_new
 2 POST index_pipeline/_bulk
 3 {"index":{"_id":1}}
 4 {"tags":["ping pang", "basket ball", " foot ball "]}
 5 {"index":{"_id":2}}
 6 {"tags":[" ping pang ", "gof ball"]}
 7 
 8 PUT _ingest/pipeline/my-pipeline
 9 {
10   "processors": [
11     {
12       "foreach": {
13         "field": "tags",
14         "processor": {
15           "trim": {
16             "field": "_ingest._value"
17           }
18         }
19       }
20     }
21   ]
22 }
23 
24 POST _reindex
25 {
26   "source": {
27     "index": "index_pipeline"
28   },
29   "dest": {
30     "index": "index_pipeline_new",
31     "pipeline": "my-pipeline"
32   }
33 }

2、pipeline+update_by_query

 1 #现有索引task2中,每个文档都有value01、value02、value03这三个字段。为这个索引中的所有文档,新增一个字段,名称为newadd,这个字段的值,是value01、value02、value03这三个字段的和(拼接)
 2 PUT task2/_doc/1
 3 {
 4   "value01":"asd",
 5   "value02":"sdg",
 6   "value03":"wer"
 7 }
 8 PUT task2/_doc/2
 9 {
10   "value01":"asdrr",
11   "value02":"sdgrr",
12   "value03":"werrr"
13 }
14 
15 PUT _ingest/pipeline/my-pipeline
16 {
17   "processors": [
18     {
19       "set": {
20         "field": "value04",
21         "value": "{{{value01}}}{{{value02}}}{{{value03}}}"
22       }
23     }
24   ]
25 }
26 
27 POST task2/_update_by_query?pipeline=my-pipeline

六、Enrich processor

按照官网的顺序直接cv

 1 #source index
 2 PUT /users/_doc/1?refresh=wait_for
 3 {
 4   "email": "mardy.brown@asciidocsmith.com",
 5   "first_name": "Mardy",
 6   "last_name": "Brown",
 7   "city": "New Orleans",
 8   "county": "Orleans",
 9   "state": "LA",
10   "zip": 70116,
11   "web": "mardy.asciidocsmith.com"
12 }
13 
14 #创建 enrich 策略
15 PUT /_enrich/policy/users-policy
16 {
17   "match": {
18     "indices": "users",      #source index的名称
19     "match_field": "email",  #哪个字段与source index进行关联
20     "enrich_fields": ["first_name", "last_name", "city", "zip", "state"]  #将哪些字段放到target索引中,这些字段来源与source index
21   }
22 }
23 
24 #执行 enrich策略
25 POST /_enrich/policy/users-policy/_execute
26 
27 #定义pipeline 使用enrich策略
28 PUT /_ingest/pipeline/user_lookup
29 {
30   "processors" : [
31     {
32       "enrich" : {
33         "description": "Add 'user' data based on 'email'",
34         "policy_name": "users-policy",
35         "field" : "email",      #target index中的字段
36         "target_field": "user", #要创建字段的名字
37         "max_matches": "1"
38       }
39     }
40   ]
41 }
42 
43 #创建索引,使用pipeline
44 PUT /my-index-000001/_doc/my_id?pipeline=user_lookup
45 {
46   "email": "mardy.brown@asciidocsmith.com"
47 }

 

上一篇:为何FPGA 外设 IP 与 HPS IP 之间有个 Avalon-MM Pipeline Bridge IP?


下一篇:Apache Beam入门及Java SDK开发初体验