kibana自带grok插件工具
处理日志读取,思路是:先分析日志信息是什么格式,以及日志规则需要filter里面的什么模块处理或者组合使用处理??
官网地址
https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html
grok正则测试
https://grokdebug.herokuapp.com/
logstash的grok路径
[root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
-rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
案例 非json格式日志
192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
通过grok正则获取
%{IP:clientip} - - \[(?<requesttime>[^ ]+ \+\d+)\] "(?<requesttype>\w+) (?<requesturl>[^ ]+) HTTP/\d.\d" (?<status>\d+) (?<size>\d+) "[^"]+" "(?<ua>[^"]+)"
效果
Grok提供的常用Patterns说明及举例
大多数Linux使用人员都有过用正则表达式来查询机器中相关文件或文件里内容的经历,在Grok里,我们也是使用正则表达式来识别日志里的相关数据块。
有两种方式来使用正则表达式:
直接写正则来匹配
用Grok表达式映射正则来匹配
在我看来,每次重新写正则是一件很痛苦的事情,为什么不用表达式来一劳永逸呢?
特别提示:Grok表达式很像C语言里的宏定义
要学习Grok的默认表达式,我们就要找到它的具体配置路径,路径如下:
# Windows下路径[你的logstash安装路径]\vendor\bundle\jruby\x.x\gems\logstash-patterns-core-x.x.x\patterns\grok-patterns 现在对常用的表达式进行说明:
常用表达式
USERNAME 或 USER
用户名,由数字、大小写及特殊字符(._-)组成的字符串
比如:1234、Bob、Alex.Wong等
EMAILLOCALPART
电子邮件用户名部分,首位由大小写字母组成,其他位由数字、大小写及特殊字符(_.+-=:)组成的字符串。注意,国内的QQ纯数字邮箱账号是无法匹配的,需要修改正则
比如:stone、Gary_Lu、abc-123等
EMAILADDRESS
电子邮件
比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等
HTTPDUSER
Apache服务器的用户,可以是EMAILADDRESS或USERNAME
INT
整数,包括0和正负整数
比如:0、-123、43987等
BASE10NUM 或 NUMBER
十进制数字,包括整数和小数
比如:0、18、5.23等
BASE16NUM
十六进制数字,整数
比如:0x0045fa2d、-0x3F8709等
BASE16FLOAT
十六进制数字,整数和小数
WORD
字符串,包括数字和大小写字母
比如:String、3529345、ILoveYou等
NOTSPACE
不带任何空格的字符串
SPACE
空格字符串
QUOTEDSTRING 或 QS
带引号的字符串
比如:"This is an apple"、'What is your name?'等
UUID
标准UUID
比如:550E8400-E29B-11D4-A716-446655440000
MAC
MAC地址,可以是Cisco设备里的MAC地址,也可以是通用或者Windows系统的MAC地址
IP
IP地址,IPv4或IPv6地址
比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等
HOSTNAME
主机名称
IPORHOST
IP或者主机名称
HOSTPORT
主机名(IP)+端口
比如:127.0.0.1:3306、api.stozen.NET:8000等
PATH
路径,Unix系统或者Windows系统里的路径格式
比如:/usr/local/nginx/sbin/nginx、c:\windows\system32\clr.exe等
URIPROTO
URI协议
比如:http、ftp等
URIHOST
URI主机
比如:www.stozen.Net、10.0.0.1:22等
URIPATH
URI路径
比如://www.stozen.net/abc/、/api.PHP等
URIPARAM
URI里的GET参数
比如:?a=1&b=2&c=3
URIPATHPARAM
URI路径+GET参数
比如://www.stozen.net/abc/api.php?a=1&b=2&c=3
URI
完整的URI
比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3
日期时间表达式
MONTH
月份名称
比如:Jan、January等
MONTHNUM
月份数字
比如:03、9、12等
MONTHDAY
日期数字
比如:03、9、31等
DAY
星期几名称
比如:Mon、Monday等
YEAR
年份数字
HOUR
小时数字
MINUTE
分钟数字
SECOND
秒数字
TIME
时间
比如:00:01:23
DATE_US
美国日期格式
比如:10-15-1982、10/15/1982等
DATE_EU
欧洲日期格式
比如:15-10-1982、15/10/1982、15.10.1982等
ISO8601_TIMEZONE
ISO8601时间格式
比如:+10:23、-1023等
TIMESTAMP_ISO8601
ISO8601时间戳格式
比如:2016-07-03T00:34:06+08:00
DATE
日期,美国日期%{DATE_US}或者欧洲日期%{DATE_EU}
DATESTAMP
完整日期+时间
比如:07-03-2016 00:34:06
HTTPDATE
http默认日期格式
比如:03/Jul/2016:00:36:53 +0800
Log表达式
LOGLEVEL
日志等级
比如:Alert、alert、ALERT、Error等
三、创建自己的Grok表达式
在业务领域中,可能会有越来越多的日志格式出现在我们眼前,而Grok的默认表达式显然已无法满足我们的需求(比如用户身份证号、手机号等信息),所以,我们需要自己动手添加些表达式。
表达式正则表达式说明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中国人习惯的日期格式ZIPCODE_CHS[1-9]\d{5}国内邮政编码GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戏账号,首字符为字母,4-15位字母、数字、下划线组成 还有很多,需要您在业务中灵活运用!
官方grok自带语法
USERNAME [a-zA-Z0-9_-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
#QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"])*"|(?:'(?:\\.|[^\\'])*')|(?:`(?:\\.|[^\\`])*`)))
QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"]+)*"|(?:'(?:\\.|[^\\']+)*')|(?:`(?:\\.|[^\\`]+)*`)))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT})
# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?:/(?:[\w_%!$@:.,-]+|\\.)*)+
NUXTTY (?:/dev/pts/%{NONNEGINT})
BSDTTY (?:/dev/tty[pq][a-z0-9])
TTY (?:%{BSDTTY}|%{LINUXTTY})
WINPATH (?:[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
# Years?
YEAR [0-9]+
# Time: HH:MM:SS
#TIME \d{2}:\d{2}(?::\d{2}(?:\.\d+)?)?
# I'm still on the fence about using grok to perform the time match,
# since it's probably slower.
# TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)?
HOUR (?:2[0123]|[01][0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE}
# Shortcuts
QS %{QUOTEDSTRING}
# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}
# Log Levels
LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![\w*/
案例 json格式日志
{"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}
通过json模块处理
input {
redis {
data_type => "list"
key => "qq-m44-nginx-log"
host => "172.31.2.106"
port => "6379"
db => "3"
password => "123456"
codec => json
}
}
# 过滤器
filter {
json {
source => "message"
remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"]
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
if [fields][app] == "nginx-errorlog" {
elasticsearch {
hosts => ["172.31.2.101:9200"]
index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}"
}}
if [fields][app] == "nginx-accesslog" {
elasticsearch {
hosts => ["172.31.2.101:9200"]
index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}"
}}
}
访问nginx,终端输出效果
{
"agent" => {
"name" => "es-web1.example.local",
"type" => "filebeat",
"ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83",
"version" => "7.12.1",
"hostname" => "es-web1.example.local",
"id" => "51f9df27-4170-4844-ba12-c719de1f4410"
},
"domain" => "172.31.2.107",
"status" => "304",
"upstreamtime" => "-",
"size" => 0,
"xff" => "-",
"ecs" => {
"version" => "1.8.0"
},
"@timestamp" => 2021-08-29T05:31:29.000Z,
"clientip" => "172.31.0.1",
"referer" => "-",
"responsetime" => 0.0,
"upstreamhost" => "-",
"http_host" => "172.31.2.107",
"url" => "/web/index.html",
"host" => "172.31.2.107",
"fields" => {
"group" => "n125",
"app" => "nginx-accesslog"
}
}