ELK——Logstash 2.2 mutate 插件【翻译+实践】

2022-05-30 22:36:00

本文内容

语法
测试数据
可选配置项

mutate 插件可以在字段上执行变换，包括重命名、删除、替换和修改。这个插件相当常用。

比如：

你已经根据 Grok 表达式将 Tomcat 日志的内容放到各个字段中，想把状态码、字节大小或是响应时间，转换成整型；

你已经根据正则表达式将日志内容放到各个字段中，但是字段的值，大小写都有，这对于 Elasticsearch 的全文检索来说，显然用处不大，那么可以用该插件，将字段内容全部转换成小写。

迁移到：http://www.bdata-cap.com/newsinfo/1712678.html

语法

该插件必须是用 mutate 包裹，如下所示：

mutate {}

可用的配置选项如下表所示：

设置	输入类型	是否必填	默认值
add_field	hash	No	{}
add_tag	array	No	[]
convert	hash	No
gsub	array	No
join	hash	No
lowercase	array	No
merge	hash	No
periodic_flush	boolean	No	false
remove_field	array	No	[]
remove_tag	array	No	[]
rename	hash	No
replace	hash	No
split	hash	No
strip	array	No
update	hash	No
uppercase	array	No

其中，add_field、remove_field、add_tag、remove_tag 是所有 Logstash 插件都有。它们在插件过滤成功后生效。虽然 Logstash 叫过滤，但不仅仅过滤功能。

tag 作用是，当你对字段处理期间，还期望进行后续处理，就先作个标记。Logstash 有个内置 tags 数组，包含了期间产生的 tag，无论是 Logstash 自己产生的，还是你添加的，比如，你用 grok 解析日志，但是错了，那么 Logstash 自己就会自己添加一个 _grokparsefailure 的 tag。这样，你在 output 时，可以对解析失败的日志不做任何处理；

而 field 作用是，对字段的操作，比如，你想利用已有的字段，创建新的字段。这些在后面再说。

另外，你会发现，上表中所有选项，要么是动词，要么是动宾短语。估计你也猜到了，选项其实就是 ruby 函数，而它们后面，即“=>”，跟着的肯定是一堆参数（要是你写程序，你也会这么干）。第一个参数，肯定是字段，也就是你期望该函数作用在哪个字段上，从第二个字段开始往后，是具体参数~

什么是字段？比如，你想解析 Tomcat 日志，把一行访问日志拆分后，得到客户端IP、字节大小、响应时间等放到指定变量，那么这个变量就是字段。

下面具体介绍各个选项。

测试数据

假设有 Tomcat access 日志：

192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/goLogin" "" 8080 200 1692 23 "http://10.1.8.193:8080/goMain" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"

192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/common/jquery-1.10.2.min.js" "" 8080 304 - 67 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"

192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/css/common/login.css" "" 8080 304 - 75 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"

192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/system/login.js" "" 8080 304 - 53 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"

它是按如下 Tomcat 配置产生的：

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"

               prefix="localhost_access_log." suffix=".txt"

               pattern="%h %l %u %t %m &quot;%U&quot; &quot;%q&quot; %p %s %b %D &quot;%{Referer}i&quot; &quot;%{User-Agent}i&quot;" />

若用如下 Grok 表达式解析该日志：

%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}

会得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T08:26:07.794Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

注意，日志拆分到各个字段后的数据类型。port、statusCode、bytes、reqTime 字段肯定是（最好是）数字，不过这里暂时先用字符串。后面会介绍，下面的示例都在此基础上。

可配置选项

add_field

值是散列，就是键值对，比如 add_field => {"field1"=>"value1","field2"=>"value2"}。
默认值是空对象，即 {}

添加新的字段。

示例：

input {

        stdin {

filter {

        grok {

                match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]

        mutate {

                add_field=>{

                         "SayHi"=>"Hello , %{clientip}"

output{

        stdout{

                codec=>rubydebug

注意黑体部分，如果用这个配置，解析前面的 Tcomat access 日志，会得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T04:52:02.031Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

            "SayHi" => "Hello , 192.168.6.25"

你会看到多了一个 SayHi 字段。这个字段是写死的，当然也可以动态。如果将

"SayHi"=>"Hello , %{clientip}"

改成：

"another_%{clientip}"=>"Hello , %{clientip}"

你会看到如下结果：

                 "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

                "@version" => "1",

              "@timestamp" => "2016-05-17T06:38:04.427Z",

                    "host" => "vcyber",

                "clientip" => "192.168.6.25",

                  "identd" => "-",

                    "auth" => "-",

               "timestamp" => "24/Apr/2016:01:25:53 +0800",

             "http_method" => "GET",

                 "request" => "\"/goLogin\"",

           "request_query" => "\"\"",

                    "port" => "8080",

              "statusCode" => "200",

                   "bytes" => "1692",

                 "reqTime" => "23",

                 "referer" => "\"http://10.1.8.193:8080/goMain\"",

               "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

    "another_192.168.6.25" => "Hello , 192.168.6.25"

虽然这个例子不太合理，但你现在知道，用已有字段的值，可以生成新的字段和它的值。上面示例只添加了一个字段，你也可以添加多个字段：

add_field=>{

        "another_%{clientip}"=>"Hello , %{clientip}"

        "another_%{http_method}"=>"Hello, %{http_method}"

add_tag

值是 array 数组
默认值为空数组，即 []

添加新的标签。

示例：

mutate {

        add_tag=>[

                "foo_%{clientip}"

你会看到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T06:48:43.278Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

             "tags" => [

        [0] "foo_192.168.6.25"

与 add_field 类似，也可以一次添加多个 tags。

注意，add_tag 是数组 []，不是 {}。

convert

值是 hash
无默认值

数据类型转换。

如果要转换成 boolean，那么可接受的数据是：

true, t, yes, y, 和 1
false, f, no, n, 和 0

另外，还可转换成 integer, float, string。

示例：

mutate {

        #convert=>["reqTime","integer","statusCode","integer","bytes","integer"]

        convert=>{"port"=>"integer"}

convert 有两种写法。一种是用数组，两个为一组；另一种是散列。得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T09:06:25.360Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => 8080,

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

注意，

port 字段，已经没有双引号啦。

mutate 插件选项的值类型设计得很简单，要么是散列（键值对），要么数组……比如，convert=>["reqTime","integer","statusCode","integer"]，两个为一组，第一个表示字段，第二个为想转换的数据类型，并没有采用嵌套或是复合类型。看来作者的意图是——简单，复杂的数据类型，虽然看起来容易，但要付出成本的。简单没关系，约定好就行。Logstash 很多插件和其选项都这样。

gsub

值是 array 数组
无默认值

字符串替换。用正则表达式和字符串都行。它只能用于字符串，如果不是字符串，那么什么都不会做，也不会报错。

该配置的值是数组，三个为一组，分别表示：字段名称，待匹配的字符串（或正则表达式），待替换的字符串。

示例：在解析 Tomcat 日志，会遇到一种情况，资源的字节大小，可能会是“-”，因此，需要将“-”，替换成0，然后在用convert转换成数字型。

input {

        stdin {

filter {

        grok {

                match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]

        mutate {

                gsub=>["bytes","_","0"]

                convert=>["port","integer","reqTime","integer","statusCode","integer","bytes","integer"]

output{

        stdout{

                codec=>rubydebug

得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/js/common/jquery-1.10.2.min.js\" \"\" 8080 304 - 67 \"http://10.1.8.193:8080/goLogin\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T09:17:21.745Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/js/common/jquery-1.10.2.min.js\"",

    "request_query" => "\"\"",

             "port" => 8080,

       "statusCode" => 304,

            "bytes" => 0,

          "reqTime" => 67,

          "referer" => "\"http://10.1.8.193:8080/goLogin\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

join

值是 hash
无默认值

用分隔符连接数组. 如果字段不是数组，那什么都不做。

示例：

filter {

  mutate {

    join =>{"fieldname"=>","}}}

lowercase 和 uppercase

值是数组 array
没有默认值

把字符串转换成小写或大写。

示例：

filter {

  mutate {

    lowercase =>["fieldname"]}}

示例：

filter {

  mutate {

    uppercase =>["fieldname"]}}

merge

值是 hash
无默认值

合并两个数组或散列字段。存在三种情况，合并后是数组：

数组和字符串，可以合并
字符串和字符串，可以合并
数组和散列不能合并

示例：

mutate {

        add_field=>{"arr_clientip"=>"%{clientip}"}

        add_field=>{"arrmstr_clientip"=>"%{clientip}"}

        add_field=>{"arrmarr_clientip"=>"%{clientip}"}

        #merge=>{"merge_clientip"=>"clientip"}

mutate {

        split=>{"arr_clientip"=>"."}

        split=>{"arrmstr_clientip"=>"."}

        split=>{"arrmarr_clientip"=>"."}

mutate {

        merge=>{"arrmstr_clientip"=>"clientip"}

        merge=>{"arrmarr_clientip"=>"arr_clientip"}

=> 后面的字段值会合并到前面的字段。

得到如下结果：

             "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

            "@version" => "1",

          "@timestamp" => "2016-05-18T02:53:35.671Z",

                "host" => "vcyber",

            "clientip" => "192.168.6.25",

              "identd" => "-",

                "auth" => "-",

           "timestamp" => "24/Apr/2016:01:25:53 +0800",

         "http_method" => "GET",

             "request" => "\"/goLogin\"",

       "request_query" => "\"\"",

                "port" => "8080",

          "statusCode" => "200",

               "bytes" => "1692",

             "reqTime" => "23",

             "referer" => "\"http://10.1.8.193:8080/goMain\"",

           "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

        "arr_clientip" => [

        [0] "192",

        [1] "168",

        [2] "6",

        [3] "25"

],

    "arrmstr_clientip" => [

        [0] "192",

        [1] "168",

        [2] "6",

        [3] "25",

        [4] "192.168.6.25"

],

    "arrmarr_clientip" => [

        [0] "192",

        [1] "168",

        [2] "6",

        [3] "25",

        [4] "192",

        [5] "168",

        [6] "6",

        [7] "25"

periodic_flush

值是 boolean
默认值是 false

按时间间隔调用。可选。

remove_field

值是数组 array
默认值是数组 []

移除字段。

示例：移除 message 字段。

mutate {

        remove_field=>["message"]

得到如下结果：

         "@version" => "1",

       "@timestamp" => "2016-05-18T02:04:16.879Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

message 字段已经没有了~message 字段保存了原始日志，如果保留的话，就意味着日志存了两份：分割前和分割后。

当然，也可以一次移除多个字段。

remove_tag

值是数组 array
默认值是 []

移除标识。

示例：

filter {

  mutate {

    remove_tag =>["foo_%{somefield}"]}}

也可以一次移动多个 tag：

filter {

  mutate {

    remove_tag =>["foo_%{somefield}","sad_unwanted_tag"]}}

rename

值是 hash
无默认值

重命名一个或多个字段。

示例:

input {

        stdin {

filter {

        grok {

                match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]

        mutate {

                rename=>{"clientip"=>"host"}

output{

        stdout{

                codec=>rubydebug

得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-17T09:29:44.018Z",

             "host" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

Grok 里，客户端IP本来叫 clientip，但是可以在 mutate 里重新命名为 host。

replace

值是 hash
无默认值

用一个新的值替换掉指定字段的值。

示例：

input {

        stdin {

filter {

        grok {

                match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]

        mutate {

                replace=>{"message"=>"%{clientip}: My new Message."}

output{

        stdout{

                codec=>rubydebug

得到如下结果：

          "message" => "192.168.6.25: My new Message.",

         "@version" => "1",

       "@timestamp" => "2016-05-18T01:55:34.566Z",

             "host" => "vcyber",

         "clientip" => "192.168.6.25",

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

message 字段的值已经变了。

split

值是 hash
无默认值

用分隔符或字符分割一个字符串。只能应用在字符串上。

示例：把客户端IP按英文句号分割成数组。

mutate {

        split=>{"clientip"=>"."}

得到如下结果：

          "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",

         "@version" => "1",

       "@timestamp" => "2016-05-18T01:58:40.687Z",

             "host" => "vcyber",

         "clientip" => [

        [0] "192",

        [1] "168",

        [2] "6",

        [3] "25"

],

           "identd" => "-",

             "auth" => "-",

        "timestamp" => "24/Apr/2016:01:25:53 +0800",

      "http_method" => "GET",

          "request" => "\"/goLogin\"",

    "request_query" => "\"\"",

             "port" => "8080",

       "statusCode" => "200",

            "bytes" => "1692",

          "reqTime" => "23",

          "referer" => "\"http://10.1.8.193:8080/goMain\"",

        "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""

strip

值是数组 array
无默认值

去掉字段首尾的空格。

示例：

filter {

  mutate {

     strip =>["field1","field2"]}}

update

值是 hash
无默认值

Update an existing field with a new value. If the field does not exist, then no action will be taken.

示例：

filter {

  mutate {

    update =>{"sample"=>"My new message"}}}

码农公寓

本文内容

迁移到：http://www.bdata-cap.com/newsinfo/1712678.html

语法

测试数据

可配置选项

add_field

add_tag

convert

gsub

join

lowercase 和 uppercase

merge

periodic_flush

remove_field

remove_tag

rename

replace

split

strip

update

相关文章