· 更多精彩内容,请下载阅读全本《Elastic Stack实战手册》
创作人:李增胜
Painless scripting 是一种简单的、安全的针对 Elasticsearch 设计的脚本语言,Painless 可以使用在任何可以使用 scripting 的场景。脚本提供了以下优点:
- 更高的性能,scripting 脚本比其他的可选脚本快数倍。
- 安全性高,更小颗粒度的字段授权机制,避免可能不必要的安全隐患安全。
- 可选类型,变量和参数可以使用显示或者动态类型编程方式。
- 语法,扩展 Java 的语法并兼容了其他脚本。
- 优化,专为 Elasticsearch 设计的脚本语言。
常用关键字:
if、else、while、do、for、in,continue,break,return,
new、try、catch、throw、this、instanceof。
常用举例
首先我们创建测试数据,商品信息
#添加测试数据
POST my_goods/_bulk
{"index":{"_id":1}}
{"goodsName":"苹果 51英寸 4K超高清","skuCode":"skuCode1","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8188.88,"groupPrice":null,"boxPrice":null,"boostValue":1.8}
{"index":{"_id":2}}
{"goodsName":"苹果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":6188.88,"groupPrice":null,"boxPrice":null,"boostValue":1.0}
{"index":{"_id":3}}
{"goodsName":"苹果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美国苹果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8388.88,"groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山东苹果UA55RU7520JXXZ 苹果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山东苹果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8488.88,"groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}
Inline script
少量代码跟随其他 DSL 一起执行的脚本,在下面的例子用会说明具体案例。
添加字段
如果我们想添加一个新字段,而新字段又依赖已有字段,如下所示,我们添加一个新品牌,品牌的名称为原有品牌的基础上拼接“新品”,就可以使用脚本来实现此业务。
POST my_goods/_update_by_query
{
"script": {
"source": "ctx._source.new_brandName = ctx._source.brandName + '新品'"
}
}
#查询结果
GET my_goods/_search
#返回(省略部分无关字段)
"hits" : [
{
"_index" : "my_goods",
"_source" : {
"shopCode" : "sc00001",
"new_brandName" : "苹果新品",
"brandName" : "苹果",
"closeUserCode" : [
"0"
]
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"shopCode" : "sc00002",
"new_brandName" : "苹果新品",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"groupPrice" : null,
"boxPrice" : null,
"channelType" : "cloudPlatform",
"boostValue" : 1.0,
"publicPrice" : "6188.88",
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2"
}
},
....
]
#可以看到使用脚本新增的字段 new_brandName 已经生效
上面的 source 表示我们使用了 Painless 脚本代码,这种使用少量代码在 DSL 中的 Painless 脚本称为 Inline script 。
删除字段
当我们需要删除已有字段时,可以通过脚本来删除
POST my_goods/_update_by_query
{
"script": {
"source": "ctx._source.remove('new_brandName')"
}
}
更改字段值
在更改字段值时,我们使用了 params 参数的形式进行处理,使用 params 有一定优点,当脚本中 source 值一样时,ES 会视为同一个脚本,会进行缓存不需要重新编译,可以加快处理速度,在下次使用时可以拿出来直接使用而不用经过编译。
#性能较差,硬编码实现价格提升2倍
POST my_goods/_update/1
{
"script": {
"source": "ctx._source.publicPrice = ctx._source.publicPrice * 2",
"lang": "painless"
}
}
#性能较优,使用 params 将 ID 为1的商品的价格提高2倍
POST my_goods/_update/1
{
"script": {
"source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent",
"lang": "painless",
"params": {
"promote_percent": 2
}
}
}
#查询
GET my_goods/_doc/1
#返回
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 4,
"_primary_term" : 1,
"found" : true,
"_source" : {
"goodsName" : "苹果 51英寸 4K超高清",
"skuCode" : "skuCode1",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"channelType" : "cloudPlatform",
"shopCode" : "sc00001",
"publicPrice" : 16377.76,
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.8
}
}
#可以看到,在更新前价格为“8188.88”,通过脚本更新后价格变为16377.76
在 Elasticsearch 中,以下的脚本会视为一个脚本:
"source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent"
下面的会被认为是 2 个不同的脚本,运行时每次都需要编译,性能比上面使用 params
稍差:
"source": "ctx._source.publicPrice = ctx._source.publicPrice * 2"
"source": "ctx._source.publicPrice = ctx._source.publicPrice * 3"
排序
#修改goodsName可以被doc访问
PUT my_goods/_mapping
{
"properties": {
"goodsName":{
"type":"text",
"fielddata": "true"
}
}
}
#查询并排序,根据商品名称长度并添加干扰因子1.1倍为最终排序结果
POST my_goods/_search
{
"query": {
"match": {
"brandName": "苹果"
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['goodsName'].value.length() * params.factor",
"params": {
"factor": 1.1
}
},
"order": "asc"
}
}
}
Stored script
先将脚本存储,在 DSL 查询时使用已经存储更好的脚本,叫做 stored script
#定义 stored script,脚本名称为:promote_price
PUT _scripts/promote_price
{
"script": {
"source": "ctx._source.publicPrice = ctx._source.publicPrice * params.value",
"lang": "painless"
}
}
如上代码所示,我们定义了一个名称为 promote_price 的脚本,作用就是提升售卖价格(publicPrice)一定的倍数,这个倍数是在调用时传入的。
POST my_goods/_update_by_query
{
"script": {
"id": "promote_price",
"params": {
"value": 2
}
}
}
执行 stored script,将会看到价格提升了 2 倍
Source 里字段访问
在使用 Painless 访问 Source 里的字段值时,需要根据运行时的上下文来确定使用的语法,Painless 常见的上下文有:update 、update_by_query、sort、ingest pipeline 等。
Context | 访问字段 |
---|---|
update | ctx._source.field_name |
ingest node | ctx.field_name |
分别举例使用 _source 与 ctx 来操作字段的值。
update
# 在上面的例子中,就曾使用过ctx._source.field_name 来更新数据
POST my_goods/_update/1
{
"script": {
"source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent",
"lang": "painless",
"params": {
"promote_percent": 2
}
}
}
ingest node
在ingest pipeline中更新字段值
#定义 pipeline
PUT _ingest/pipeline/add_my_goods_newField
{
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.skuCode_brandName = ctx.skuCode + ctx.brandName"
}
}
]
}
#执行 pipeline
POST my_goods/_update_by_query?pipeline=add_my_goods_newField
{
}
#查询结果
GET my_goods/_search
#返回
"hits" : [
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"shopCode" : "sc00002",
"brandName" : "苹果",
"closeUserCode" : [
"0"
],
"skuCode_brandName" : "skuCode2苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 12377.76,
"goodsName_length" : 13,
"groupPrice" : null,
"boxPrice" : null,
"boostValue" : 1.0,
"goodsName" : "苹果 55英寸 3K超高清",
"skuCode" : "skuCode2"
}
},
{
"_index" : "my_goods",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"shopCode" : "sc00001",
"brandName" : "美国苹果",
"closeUserCode" : [
"0"
],
"skuCode_brandName" : "skuCode3美国苹果",
"channelType" : "cloudPlatform",
"publicPrice" : 16777.76,
"goodsName_length" : 26,
"groupPrice" : null,
"boxPrice" : [
{
"boxType" : "box1",
"boxUserCode" : [
"htd003",
"uc004"
],
"boxPriceDetail" : 4388.88
},
{
"boxType" : "box2",
"boxUserCode" : [
"uc005",
"uc0010"
],
"boxPriceDetail" : 5388.88
}
],
"boostValue" : 1.2,
"goodsName" : "苹果UA55RU7520JXXZ 53英寸 4K高清",
"skuCode" : "skuCode3"
}
},
....
]
可以看到 ,skuCode_brandName 是通过 skuCode+brandName 拼接成功的,通过 ctx.field 访问字段成功。
Painless Debug
Elasticsearch 中为我们提供了脚本调试方法,使我们在使用时可以方便的进行脚本调试,
#定义用户信息,shop_id为用户开的店铺ID信息
PUT /user_info/_doc/1?refresh
{
"first": "Michael",
"last": "Jordan",
"shop_id": [
100,
102,
103
],
"time": "2021-05-09"
}
PUT /user_info/_doc/2?refresh
{
"first": "Michael2",
"last": "Jordan2",
"shop_id": [
110,
112,
113,
114,
115
],
"time": "2021-05-08"
}
#查看mapping
GET user_info/_mapping
#返回
{
"user_info" : {
"mappings" : {
"properties" : {
"first" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"last" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"shop_id" : {
"type" : "long"
},
"time" : {
"type" : "date"
}
}
}
}
}
可以看到返回了很多字段类型,包括:long、date、keyword、text,每种类型有哪些方法可以操作呢?一种是查看官网文档,另外一种获取使用的方法就是通过调试来获取信息了,使用_explain 来看看效果:
POST /user_info/_explain/1
{
"query": {
"script": {
"script": "Debug.explain(doc.shop_id)"
}
}
}
#返回:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
"to_string": "[100, 102, 103]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
"script_stack": [
"Debug.explain(doc.shop_id)",
" ^---- HERE"
],
"script": "Debug.explain(doc.shop_id)",
"lang": "painless",
"position": {
"offset": 17,
"start": 0,
"end": 26
}
}
],
"type": "script_exception",
"reason": "runtime error",
"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
"to_string": "[100, 102, 103]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
"script_stack": [
"Debug.explain(doc.shop_id)",
" ^---- HERE"
],
"script": "Debug.explain(doc.shop_id)",
"lang": "painless",
"position": {
"offset": 17,
"start": 0,
"end": 26
},
"caused_by": {
"type": "painless_explain_error",
"reason": null
}
},
"status": 400
}
可以看到是一个 runtime error 异常,那我们应该如何解决呢?
仔细观察,doc.shop_id 是这样的类提供支撑:
"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs"
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs"
通过 Painless Script 的 API 帮助:https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-api-reference.html,
最终找到 Long 类型的 API 文档地址:https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-api-reference-shared-org-elasticsearch-index-fielddata.html#painless-api-reference-shared-ScriptDocValues-Longs
ScriptDocValues.Longs
- List asList()
- int getLength()
- Collection asCollection()
- Long get(int)
- .......
我们通过观察数据知道 shop_id 存储的是一个 list 数据,加入我们要获取第一个数据,
再次调整脚本:
GET user_info/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"source": """
return doc['shop_id'].getLength();
"""
}
}
}
}
}
#返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 5.0,
"hits" : [
{
"_index" : "user_info",
"_type" : "_doc",
"_id" : "2",
"_score" : 5.0,
"_source" : {
"first" : "Michael2",
"last" : "Jordan2",
"shop_id" : [
110,
112,
113,
114,
115
],
"time" : "2021-05-08"
}
},
{
"_index" : "user_info",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"first" : "Michael",
"last" : "Jordan",
"shop_id" : [
100,
102,
103
],
"time" : "2021-05-09"
}
}
]
}
}
可以看到,得分最高的为 "max_score" : 5.0, 因为我们使用 script_score 调整了评分,以店铺 ID 个数为评分结果,文档 2 共计 5 个ID,所以返回的是 5 。
通过以上案例,详细解读了 Painless Debug 在实际场景中的应用,通过一步步分析最终掌握了调试、看错误信息、找官方文档解决的方法,最终实现了掌握 Painless Debug 的目的。