Prometheus+Grafana实现SpringCloud服务监控

2021-09-19 11:21:48

背景：

由于项目上使用spring cloud，需要对一些服务指标就需要进行监控，以便于时刻了解各服务的运行状态。经过搜索材料，发现大多方案直接推荐用微服务最佳实践者——Netflix开源的方案（Atlas+Grafana），试着跟着搜索到的一些资料，并进行了尝试，结果表明成功案例都是在SpringBoot1.x上的；在SpringBoot2.x上，尚无资料表示成功使用上Atlas这方案。

另外，在研究的过程中，发现SpringBoot2.x上已引入第三方实现的metrics Facade（micrometer.io，可以同日志的Facade框架Sl4j等同理解，并已覆盖Atlas、Datadog、Ganglia、Graphite、Influx、JMX、NewRelic、Prometheus、SignalFx、StatsD、Wavefront等业内主流的tsdb实现）。再者，项目最终需要应用kubernetes来进行容器编排，其官方推荐的容器监控服务，就是用Prometheus配上Grafana作为监控展现；为了统一,就转身到Prometheus+Grafana这方案上来。

0. 环境说明：

Ubuntu 16.04

Spring Boot 2.0.0.RELEASE

初识Prometheus：

Prometheus 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 github 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF(Cloud Native Computing Foundation) 成员。

作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。

主要功能

多维数据模型（时序由 metric 名字和 k/v 的 labels 构成）。
灵活的查询语句（PromQL）。
无依赖存储，支持 local 和 remote 不同模型。
采用 http 协议，使用 pull 模式，拉取数据，简单易懂。
监控目标，可以采用服务发现或静态配置的方式。
支持多种统计数据模型，图形化友好。

核心组件

Prometheus Server，主要用于抓取数据和存储时序数据，另外还提供查询和 Alert Rule 配置管理。
client libraries，用于对接 Prometheus Server, 可以查询和上报数据。
push gateway ，用于批量，短期的监控数据的汇总节点，主要用于业务数据汇报等。
各种汇报数据的 exporters ，例如汇报机器数据的 node_exporter, 汇报 MongoDB 信息的 MongoDB exporter 等等。
用于告警通知管理的 alertmanager 。

基础架构

从这个架构图，也可以看出 Prometheus 的主要模块包含， Server, Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。

大致使用逻辑是这样：

Prometheus server 定期从静态配置的 targets 或者服务发现的 targets 拉取数据。
当新拉取的数据大于配置内存缓存区的时候，Prometheus 会将数据持久化到磁盘（如果使用 remote storage 将持久化到云端）。
Prometheus 可以配置 rules，然后定时查询数据，当条件触发的时候，会将 alert 推送到配置的 Alertmanager。
Alertmanager 收到警告的时候，可以根据配置，聚合，去重，降噪，最后发送警告。
可以使用 API， Prometheus Console 或者 Grafana 查询和聚合数据。

注意

Prometheus 的数据是基于时序的 float64 的值，如果你的数据值有更多类型，无法满足。
Prometheus 不适合做审计计费，因为它的数据是按一定时间采集的，关注的更多是系统的运行瞬时状态以及趋势，即使有少量数据没有采集也能容忍，但是审计计费需要记录每个请求，并且数据长期存储，这个 Prometheus 无法满足，可能需要采用专门的审计系统。

了解更多知识，可以到这里了解：Prometheus实战

Spring cloud应用的配置改动：

pom.xml 加入相关包依赖依赖 :

<dependency>  <groupId>io.micrometer</groupId>  <artifactId>micrometer-core</artifactId>  <version>1.0.5</version> </dependency> <dependency>  <groupId>io.micrometer</groupId>  <artifactId>micrometer-registry-prometheus</artifactId>  <version>1.0.5</version>  <exclusions>  <exclusion> <!-- 由于 micrometer-registry-prometheus 默认的core包是 1.0.1版本的，与当前的组件版本不适应，需要排除 -->  <groupId>io.micrometer</groupId>  <artifactId>micrometer-core</artifactId>  </exclusion>  </exclusions> </dependency>

修改application.yml:

#服务治理配置

management:  security: # 仅限于 开发环境可对security进行关闭。  enabled: false  metrics:  export:  prometheus:  enabled: true  step: 1m
 descriptions: true  web:  server:  auto-time-requests: true  endpoints:  web:  exposure:  include: health,info,env,prometheus,metrics,httptrace,threaddump,heapdump

这里为了方便部署监控测试，把服务治理端的相关安全认证选项禁用掉。更多关于metrics的配置说明，请参考官方指引。

关于endpoints，"prometheus"在本例中，是必须声明的项目，否则在后续配置prometheus的job时没有对应的uri可以提供。

由于prometheus是采用主动爬取的方式，所以在SpringCloud的应用里面，无需配置prometheus的服务地址和端口。这点（个人理解），是与Atlas、Ganglia、Graphite、Influx、JMX、StatsD、Wavefront等不同的。

接着启动你的应用，打开浏览器输入http://localhost:PORT/actuator/prometheus，正常情况下你就可以在页面上看到很多键值对(直接返回的是text)：

Prometheus配置:

安装:
去官网下载，根据你自己的操作系统选择版本，本例下载的是prometheus-2.3.1.linux-amd64.tar.gz。下载完，解压，找到prometheus.yml改配置：

# my global config global:
 scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration
alerting:
 alertmanagers:
 - static_configs:
 - targets:
 # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
 # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself.
scrape_configs:
 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'.

 static_configs:
 - targets: ['localhost:9090']

 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'admin-service' # metrics_path defaults to '/metrics' # metrics_path: /actuator/metrics
 metrics_path: /actuator/prometheus
 # scheme defaults to 'http'.

 static_configs:
 - targets: ['localhost:8762']

说明：下载下来的配置文件，metrics_path是被注释的。在本例使用的是SpringBoot 2.0.0.RELEASE，默认监控类指标输出，都是在/actuator下。

所以，需要修改为”metrics_path: /actuator/prometheus“，且非注释。

启动Prometheus:
启动终端，进入到Prometheus的目录中，运行prometheus -config.file=prometheus.yml,启动成功后，在http://localhost:9090/targets你会看到你所监控的目标：

要看有哪些指标可以现成使用的，可以在Graph菜单下，那个下拉列表中查找：

你可以选中一个指标，然后execute,就可以看到一个简陋的图表，显示指标的数据：

增强配置：

考虑到后续需要对同一个服务增加不用业务类型的监控，或者直接对同一个job应用到不同的host上时，采用static_configs的方式，是需要重启Prometheus服务的。

因此，通过研究，可以通过 file_sd_configs来替代static_configs：

（1）. 注释修改prometheus.yml文件：

 #static_configs: #- targets: ['localhost:8762'] file_sd_configs:
 - files:
 - /YOUR_HOME/softs/prometheus/configs/admin/*.json

(2). 在/YOUR_HOME/softs/prometheus/configs/admin/目录里，创建一个任意名字的json文件。本例是创建base.json文件。内容如下：

[
 {
 "targets": ["localhost:8762"]
 }
]

更多配置说明，请参考官方说明：file_sd_config

更好的监控UI——Grafana:

安装:
同样的进入官网下载，请根据你的操作系统下载。本例下载的版本是grafana_5.1.3_amd64.deb ，需要使用dpkg命令进行安装。安装过程如下：
1. 非root用户 $ sudo dpkg -i grafana_5.1.3_amd64.deb

2. 启动服务： $ sudo systemctl start grafana-server

3. 访问http://localhost:3000你就会看到登录界面，默认的账户和密码都是admin。

配置：
add datasource ,加上Prometheus的数据源配置。

接着我们新加入一个Dashboards:

新增一个Graph:

接着选择edit：

然后在metric中编辑数据源和指标，比如我们新增一个threads(线程数)的指标，完成后就可以看到美丽的图表了

本文转自掘金-Prometheus+Grafana实现SpringCloud服务监控

码农公寓

主要功能

核心组件

基础架构

注意

相关文章