GoAccess 是一款开源(MIT许可证)的且具有交互视图界面的实时 Web 日志分析工具,通过你的 Web 浏览器或者 *nix 系统下的终端程序即可访问。
能为系统管理员提供快速且有价值的 HTTP 统计,并以在线可视化服务器的方式呈现。 GoAccess 解析指定的 Web 日志文件并将统计结果输出到 X 终端。功能如下:
- 通用统计: 此面板展示了几个主要指标,比如:有效和无效请求的数量,分析这些数据所花费的时间,独立访客的情况,请求的文件,静态文件(CSS, ICO, JPG 等)的完整URL,404错误,被解析的日志文件的大小以及消耗的带宽。
-
独立访客: 此面板按照日期展示了访问次数,独立访客数,以及累计消耗的带宽等指标。具有相同IP,相同访问时间,相同的 UserAgent 的 HTTP 请求将会被识别为独立访客。默认情况下包含了网络爬虫。
您也可以选择使用 --date-spec=hr 参数将按照日期分析修改为按照小时,例如:05/Jun/2016:16 。这对于希望在小时级别去跟踪每日流量非常有帮助。 - 请求的文件: 此面板展示您服务器上被请求次数最多的文件。包含访问次数,独立访客数,百分比,累计消耗带宽,使用的协议,请求方式。
- 请求的静态文件: 列出请求频率最高的静态文件类型,例如: JPG, CSS, SWF, JS, GIF, 和 PNG , 以及和上一个面板一样的其他指标。 另外静态文件可以被添加到配置文件中。
- 404 或者文件未找到: 展示内容与之前的面板类似,但是其数据包含了所有未找到的页面,以及俗称的 404 状态码。
-
主机: 此面板展示主机自身的详细信息。能够很好的发现不怀好意的爬虫以及识别出是谁吃掉了你的带宽。
扩展面板将向您展示更多信息,比如主机的反向DNS解析结果,主机所在国家和城市。如果开启了 参数,选择想查看的 IP 地址并按回车,将会显示 UserAgent 列表。 - 操作系统: 此面板将显示主机使用的操作系统的信息。GoAccess 将尽可能尝试为每一款操作系统提供详细的信息。
- 浏览器: 此面板将显示来访主机使用的浏览器信息。GoAccess 将尽可能尝试为每一款浏览器提供详细的信息。
-
访问次数: 此面板按小时报告。因此将显示24个数据点,每一个均对应每一天的某一个小时。
使用 --hour-spec=min 参数可以设定为按每十分钟报告,并将以 16:4 的格式显示时间。这对发现服务器的峰值访问时段很有帮助。 - 虚拟主机: 此面板将显示从访问日志中解析出来的不同的虚拟主机的情况。此面板仅在日志格式中启用了 %v 参数时显示。
-
来路URL: 如果问题主机通过其他的资源访问了你的站点,以及通过从其他主机上的链接或者跳转到你的站点,则这些来路URL将会被显示在此面板。可以在配置文件中通过
--ignore-panel
开启此功能。(默认关闭) - 来路站点: 此面板将仅显示主机的部分,而不是完整的URL。
-
关键字: 报告支持用在谷歌搜索,谷歌缓存,谷歌翻译上使用关键字。目前仅支持通过 HTTP 使用谷歌搜索。 可以在配置文件中通过
--ignore-panel
开启此功能。(默认关闭) - 地理位置: 根据 IP 地址判断地理位置。统计数据按照大洲和国家分组。需要地理位置模块的支持。
- HTTP 状态码: 以数字表示的 HTTP 请求的状态编码。
- 远程用户(HTTP验证) 通过 HTTP 验证来确定访问文档的权限。如果文档没有被密码保护起来,这部分将会显示为 “-”。此面板默认为开启,除非在日志格式变量中设置了参数 %e 。
GoAccess使用
安装goaccess
[root@VM_0_26_centos logs]# yum install goaccess
Loaded plugins: fastestmirror, langpacks
Repository epel is listed more than once in the configuration
epel | 4.7 kB 00:00:00
extras | 2.9 kB 00:00:00
nux-dextop | 2.9 kB 00:00:00
os | 3.6 kB 00:00:00
rpmfusion-free-updates | 3.7 kB 00:00:00
rpmfusion-nonfree-updates | 3.7 kB 00:00:00
updates | 2.9 kB 00:00:00
zabbix | 2.9 kB 00:00:00
zabbix-non-supported | 951 B 00:00:00
(1/2): epel/7/x86_64/updateinfo | 1.0 MB 00:00:00
(2/2): epel/7/x86_64/primary_db | 6.9 MB 00:00:02
Loading mirror speeds from cached hostfile
* nux-dextop: mirror.li.nux.ro
* rpmfusion-free-updates: mirrors.ustc.edu.cn
* rpmfusion-nonfree-updates: mirrors.ustc.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package goaccess.x86_64 0:1.3-1.el7 will be installed
--> Processing Dependency: libtokyocabinet.so.9()(64bit) for package: goaccess-1.3-1.el7.x86_64
--> Running transaction check
---> Package tokyocabinet.x86_64 0:1.4.48-3.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================================
Package Arch Version Repository Size
===========================================================================================
Installing:
goaccess x86_64 1.3-1.el7 epel 240 k
Installing for dependencies:
tokyocabinet x86_64 1.4.48-3.el7 os 459 k
Transaction Summary
===========================================================================================
Install 1 Package (+1 Dependent package)
Total download size: 699 k
Installed size: 2.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/2): goaccess-1.3-1.el7.x86_64.rpm | 240 kB 00:00:00
(2/2): tokyocabinet-1.4.48-3.el7.x86_64.rpm | 459 kB 00:00:00
-------------------------------------------------------------------------------------------
Total 1.3 MB/s | 699 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : tokyocabinet-1.4.48-3.el7.x86_64 1/2
Installing : goaccess-1.3-1.el7.x86_64 2/2
Verifying : tokyocabinet-1.4.48-3.el7.x86_64 1/2
Verifying : goaccess-1.3-1.el7.x86_64 2/2
Installed:
goaccess.x86_64 0:1.3-1.el7
Dependency Installed:
tokyocabinet.x86_64 0:1.4.48-3.el7
查看使用方式
[root@VM_0_26_centos logs]# goaccess -help
GoAccess - 1.3
Usage: goaccess [filename] [ options ... ] [-c][-M][-H][-S][-q][-d][...]
The following options can also be supplied to the command:
Log & Date Format Options
--date-format=<dateformat> - Specify log date format. e.g., %d/%b/%Y
--log-format=<logformat> - Specify log format. Inner quotes need to be
escaped, or use single quotes.
--time-format=<timeformat> - Specify log time format. e.g., %H:%M:%S
User Interface Options
-c --config-dialog - Prompt log/date/time configuration window.
-i --hl-header - Color highlight active panel.
-m --with-mouse - Enable mouse support on main dashboard.
--color=<fg:bg[attrs, PANEL]> - Specify custom colors. See manpage for more
details and options.
--color-scheme=<1|2|3> - Schemes: 1 => Grey, 2 => Green, 3 => Monokai.
--html-custom-css=<path.css> - Specify a custom CSS file in the HTML report.
--html-custom-js=<path.js> - Specify a custom JS file in the HTML report.
--html-prefs=<json_obj> - Set default HTML report preferences.
--html-report-title=<title> - Set HTML report page title and header.
--json-pretty-print - Format JSON output w/ tabs & newlines.
--max-items - Maximum number of items to show per panel.
See man page for limits.
--no-color - Disable colored output.
--no-column-names - Don‘t write column names in term output.
--no-csv-summary - Disable summary metrics on the CSV output.
--no-html-last-updated - Hide HTML last updated field.
--no-parsing-spinner - Disable progress metrics and parsing spinner.
--no-progress - Disable progress metrics.
--no-tab-scroll - Disable scrolling through panels on TAB.
Server Options
--addr=<addr> - Specify IP address to bind server to.
--daemonize - Run as daemon (if --real-time-html enabled).
--fifo-in=<path> - Path to read named pipe (FIFO).
--fifo-out=<path> - Path to write named pipe (FIFO).
--origin=<addr> - Ensure clients send the specified origin header
upon the WebSocket handshake.
--pid-file=<path> - Write PID to a file when --daemonize is used.
--port=<port> - Specify the port to use.
--real-time-html - Enable real-time HTML output.
--ssl-cert=<cert.crt> - Path to TLS/SSL certificate.
--ssl-key=<priv.key> - Path to TLS/SSL private key.
--ws-url=<url> - URL to which the WebSocket server responds.
File Options
- - The log file to parse is read from stdin.
-f --log-file=<filename> - Path to input log file.
-S --log-size=<number> - Specify the log size, useful when piping in logs.
-l --debug-file=<filename> - Send all debug messages to the specified
file.
-p --config-file=<filename> - Custom configuration file.
--invalid-requests=<filename> - Log invalid requests to the specified file.
--no-global-config - Don‘t load global configuration file.
Parse Options
-a --agent-list - Enable a list of user-agents by host.
-b --browsers-file=<path> - Use additional custom list of browsers.
-d --with-output-resolver - Enable IP resolver on HTML|JSON output.
-e --exclude-ip=<IP> - Exclude one or multiple IPv4/6. Allows IP
ranges e.g. 192.168.0.1-192.168.0.10
-H --http-protocol=<yes|no> - Set/unset HTTP request protocol if found.
-M --http-method=<yes|no> - Set/unset HTTP request method if found.
-o --output=file.html|json|csv - Output either an HTML, JSON or a CSV file.
-q --no-query-string - Ignore request‘s query string. Removing the
query string can greatly decrease memory
consumption.
-r --no-term-resolver - Disable IP resolver on terminal output.
--444-as-404 - Treat non-standard status code 444 as 404.
--4xx-to-unique-count - Add 4xx client errors to the unique visitors
count.
--anonymize-ip - Anonymize IP addresses before outputting to report.
--all-static-files - Include static files with a query string.
--crawlers-only - Parse and display only crawlers.
--date-spec=<date|hr> - Date specificity. Possible values: `date`
(default), or `hr`.
--double-decode - Decode double-encoded values.
--enable-panel=<PANEL> - Enable parsing/displaying the given panel.
--hide-referer=<NEEDLE> - Hide a referer but still count it. Wild cards
are allowed. i.e., *.bing.com
--hour-spec=<hr|min> - Hour specificity. Possible values: `hr`
(default), or `min` (tenth of a min).
--ignore-crawlers - Ignore crawlers.
--ignore-panel=<PANEL> - Ignore parsing/displaying the given panel.
--ignore-referer=<NEEDLE> - Ignore a referer from being counted. Wild cards
are allowed. i.e., *.bing.com
--ignore-statics=<req|panel> - Ignore static requests.
req => Ignore from valid requests.
panel => Ignore from valid requests and panels.
--ignore-status=<CODE> - Ignore parsing the given status code.
--num-tests=<number> - Number of lines to test. >= 0 (10 default)
--process-and-exit - Parse log and exit without outputting data.
--real-os - Display real OS names. e.g, Windows XP, Snow
Leopard.
--sort-panel=PANEL,METRIC,ORDER - Sort panel on initial load. For example:
--sort-panel=VISITORS,BY_HITS,ASC. See
manpage for a list of panels/fields.
--static-file=<extension> - Add static file extension. e.g.: .mp3.
Extensions are case sensitive.
GeoIP Options
-g --std-geoip - Standard GeoIP database for less memory
consumption.
--geoip-database=<path> - Specify path to GeoIP database file. i.e.,
GeoLiteCity.dat, GeoIPv6.dat ...
Other Options
-h --help - This help.
-V --version - Display version information and exit.
-s --storage - Display current storage method. e.g., B+
Tree, Hash.
--dcf - Display the path of the default config
file when `-p` is not used.
Examples can be found by running `man goaccess`.
For more details visit: http://goaccess.io
GoAccess Copyright (C) 2009-2017 by Gerardo Orellana
获取Nginx日志格式
格式转换脚本在https://github.com/stockrt/nginx2goaccess/blob/master/nginx2goaccess.sh,具体内容如下
[root@VM_0_26_centos logs]# cat nginx2goaccess.sh
#!/bin/bash
#
# Convert from this:
# http://nginx.org/en/docs/http/ngx_http_log_module.html
# To this:
# https://goaccess.io/man
#
# Conversion table:
# $time_local %d:%t %^
# $host %v
# $http_host %v
# $remote_addr %h
# $request_time %T
# $request_method %m
# $request_uri %U
# $server_protocol %H
# $request %r
# $status %s
# $body_bytes_sent %b
# $bytes_sent %b
# $http_referer %R
# $http_user_agent %u
#
# Samples:
#
# log_format combined ‘$remote_addr - $remote_user [$time_local] ‘
# ‘"$request" $status $body_bytes_sent ‘
# ‘"$http_referer" "$http_user_agent"‘;
# ./nginx2goaccess.sh ‘$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"‘
#
# log_format compression ‘$remote_addr - $remote_user [$time_local] ‘
# ‘"$request" $status $bytes_sent ‘
# ‘"$http_referer" "$http_user_agent" "$gzip_ratio"‘;
# ./nginx2goaccess.sh ‘$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" "$gzip_ratio"‘
#
# log_format main
# ‘$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\t‘
# ‘Local:\t$status\t$body_bytes_sent\t$request_time\t‘
# ‘Proxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\t‘
# ‘Agent:\t$http_user_agent\t‘
# ‘Fwd:\t$http_x_forwarded_for‘;
# ./nginx2goaccess.sh ‘$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\tLocal:\t$status\t$body_bytes_sent\t$request_time\tProxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\tAgent:\t$http_user_agent\tFwd:\t$http_x_forwarded_for‘
#
# log_format main
# ‘${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t‘
# ‘${http_referer}\t${http_x_mobile_group}\t‘
# ‘Local:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\t‘
# ‘Proxy:\t${upstream_status}\t${upstream_cache_status}\t‘
# ‘${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\t‘
# ‘Agent:\t${http_user_agent}\t‘
# ‘Fwd:\t${http_x_forwarded_for}‘;
# ./nginx2goaccess.sh ‘${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t${http_referer}\t${http_x_mobile_group}\tLocal:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\tProxy:\t${upstream_status}\t${upstream_cache_status}\t${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\tAgent:\t${http_user_agent}\tFwd:\t${http_x_forwarded_for}‘
#
# Author: Rogério Carvalho Schneider <stockrt@gmail.com>
# Params
log_format="$1"
# Usage
if [[ -z "$log_format" ]]; then
echo "Usage: $0 ‘<log_format>‘"
exit 1
fi
# Variables map
conversion_table="time_local,%d:%t_%^
host,%v
http_host,%v
remote_addr,%h
request_time,%T
request_method,%m
request_uri,%U
server_protocol,%H
request,%r
status,%s
body_bytes_sent,%b
bytes_sent,%b
http_referer,%R
http_user_agent,%u"
# Conversion
for item in $conversion_table; do
nginx_var=${item%%,*}
goaccess_var=${item##*,}
goaccess_var=${goaccess_var//_/ }
log_format=${log_format//\$\{$nginx_var\}/$goaccess_var}
log_format=${log_format//\$$nginx_var/$goaccess_var}
done
log_format=$(echo "$log_format" | sed ‘s/${[a-z_]*}/%^/g‘)
log_format=$(echo "$log_format" | sed ‘s/$[a-z_]*/%^/g‘)
# Config output
echo "
- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format $log_format
"
# EOF
注意,其中nginx配置文件的log_format如下,下面转换时需要与实际情况保持一致
log_format main ‘$remote_addr - $remote_user [$time_local] "$request" ‘
‘$status $upstream_addr $body_bytes_sent "$http_referer" ‘
‘"$http_user_agent" "$http_x_forwarded_for"‘;
获取日志格式
[root@VM_0_26_centos logs]# sh nginx2goaccess.sh ‘$remote_addr - $remote_user [$time_local] "$request" $status $upstream_addr $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"‘
- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"
设置日志格式
[root@VM_0_26_centos logs]# cat /etc/goaccess/goaccess.conf
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"
生成分析报告
[root@VM_0_26_centos logs]# goaccess -f ./nginx_access.log -p ./nginxlog.conf -o day-report.html
[root@VM_0_26_centos logs]# ls
day-report.html nginx_access.log nginx2goaccess.sh nginxlog.conf
查看报告效果
浏览器打开day-report.html,效果如下