windows上搭建Skywalking+Elasticsearch的一次实践

前言
近期公司内部在做技术拉伸项,考虑到之前有看过Skywalking的相关文章,但是一直也没有自己本地搭建实践一下,借此机会,尝试一把。做一下入门的尝试和学习。

什么是Skywalking

Skywalking是一款国产APM(应用程序性能监视)工具,专为微服务、云原生架构和基于容器架构而设计。
提供了分布式追踪、应用和服务依赖分析、服务网格遥测分析、度量聚合和可视化一体化解决方案

主要支持功能

  • 度量指标可视化
  • 应用依赖拓扑图
  • 分布式调用追踪
  • 度量指标计算分析
  • 链路日志查询
  • 服务应用报警

官网给的架构图
windows上搭建Skywalking+Elasticsearch的一次实践
比较抽象,我自己理解后也画了个图
windows上搭建Skywalking+Elasticsearch的一次实践
看着很丑是吧,但是很清晰呀,其实Skywalking应用也就四个部分
1-植入探针
2-推送应用监测数据到oapservice
3-到达oapservice的数据经过加工分析后落库
4-可视化UI页面提供数据分析

整体背景大概就这样,详细介绍请移步官方Skywalking
下面开始在windows上搞起!

Elasticsearch下载启动

基于本次实践需要用到数据存储,应用服务和Skywalking都可以支持的存储中间件,于是就选择了Elasticsearch

下载Elasticsearch

下载windows版本 目前最新版本7.14.1,我就喜欢用最新的,所以本次实践也是下载最新版本的(Elasticsearch的版本兼容问题一大堆,如果你没有跟我一样的洁癖,请随意!)

启动Elasticsearch

打开PowerShell 运行bin/elasticsearch(或bin\elasticsearch.bat在 Windows 上)
windows上搭建Skywalking+Elasticsearch的一次实践
观察没有报错后在浏览器打开http://localhost:9200
windows上搭建Skywalking+Elasticsearch的一次实践
好,到此存储是搞完了!

Skywalking 下载启动

下载Skywalking

贴个镜像地址下载
还是一样,本人喜欢最新版本,目前最新版本是8.7.0,其他版本请移步历史版本下载

下载完解压文件(隐藏了文件,太多了,只展示目录)

├─agent
│ ├─activations
│ ├─bootstrap-plugins
│ ├─config
│ ├─logs
│ ├─optional-plugins
│ ├─optional-reporter-plugins
│ └─plugins
├─bin
├─config
│ ├─envoy-metrics-rules
│ ├─fetcher-prom-rules
│ ├─lal
│ ├─log-mal-rules
│ ├─meter-analyzer-config
│ ├─oal
│ ├─otel-oc-rules
│ ├─ui-initialized-templates
│ └─zabbix-rules
├─config-examples
├─licenses
│ └─ui-licenses
├─oap-libs
├─tools
│ └─profile-exporter
└─webapp

目录结构

bin目录存放的是启动脚本,包含oapService.sh、webappService.sh等启动脚本
config是oap服务的配置,包含一个application.yml的配置
agent是skywalking的agent,和业务系统绑定在一起,负责收集各种监控数据
webapp目录是skywalking前端的UI界面服务的配置

启动Skywalking

启动skyWalking oapService

启动前我们配一下配置文件
config目录下有个application.yml 主要修改一下数据存储方式

cluster:
  selector: ${SW_CLUSTER:standalone}
  standalone:
  ...
storage:
  selector: ${SW_STORAGE:elasticsearch7}
  elasticsearch7:
    nameSpace: ${SW_NAMESPACE:"my-application"}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
    connectTimeout: ${SW_STORAGE_ES_CONNECT_TIMEOUT:500}
    socketTimeout: ${SW_STORAGE_ES_SOCKET_TIMEOUT:30000}
    trustStorePath: ${SW_STORAGE_ES_SSL_JKS_PATH:""}
    trustStorePass: ${SW_STORAGE_ES_SSL_JKS_PASS:""}
    dayStep: ${SW_STORAGE_DAY_STEP:1} # Represent the number of days in the one minute/hour/day index.
    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:1} # Shard number of new indexes
    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:1} # Replicas number of new indexes
    # Super data set has been defined in the codes, such as trace segments.The following 3 config would be improve es performance when storage super size data in es.
    superDatasetDayStep: ${SW_SUPERDATASET_STORAGE_DAY_STEP:-1} # Represent the number of days in the super size dataset record index, the default value is the same as dayStep when the value is less than 0
    superDatasetIndexShardsFactor: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} #  This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces.
    superDatasetIndexReplicasNumber: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas number in the super size dataset record index, the default value is 0.
    indexTemplateOrder: ${SW_STORAGE_ES_INDEX_TEMPLATE_ORDER:0} # the order of index template
    user: ${SW_ES_USER:""}
    password: ${SW_ES_PASSWORD:""}
    secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets management file in the properties format includes the username, password, which are managed by 3rd party tool.
    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
    # flush the bulk every 10 seconds whatever the number of requests
    # INT(flushInterval * 2/3) would be used for index refresh period.
    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
    profileTaskQueryMaxSize: ${SW_STORAGE_ES_QUERY_PROFILE_TASK_SIZE:200}
    oapAnalyzer: ${SW_STORAGE_ES_OAP_ANALYZER:"{\"analyzer\":{\"oap_analyzer\":{\"type\":\"stop\"}}}"} # the oap analyzer.
    oapLogAnalyzer: ${SW_STORAGE_ES_OAP_LOG_ANALYZER:"{\"analyzer\":{\"oap_log_analyzer\":{\"type\":\"standard\"}}}"} # the oap log analyzer. It could be customized by the ES analyzer configuration to support more language log formats, such as Chinese log, Japanese log and etc.
    advanced: ${SW_STORAGE_ES_ADVANCED:""}

打开PowerShell 切换到skywalking的bin目录下
运行 .\oapService.bat
如下图即启动成功
windows上搭建Skywalking+Elasticsearch的一次实践

启动skyWalking webapp

一样,启动前配置一下配置文件,在webapp下的webapp.xml

server:
  port: 8080
spring:
  cloud:
    gateway:
      routes:
        - id: oap-route
          uri: lb://oap-service
          predicates:
            - Path=/graphql/**
    discovery:
      client:
        simple:
          instances:
            oap-service:
              - uri: http://127.0.0.1:12800
            # - uri: http://<oap-host-1>:<oap-port1>
            # - uri: http://<oap-host-2>:<oap-port2>
  mvc:
    throw-exception-if-no-handler-found: true
  web:
    resources:
      add-mappings: true
management:
  server:
    base-path: /manage

再打开一个PowerShell 还是到bin目录
运行 .\webappService.bat
如下图即启动成功
windows上搭建Skywalking+Elasticsearch的一次实践
打开http://localhost:8080/ (刚刚配置Skywalking的UI页面启动指定端口是8080,注意一会起应用服务的端口不要冲突)
windows上搭建Skywalking+Elasticsearch的一次实践
因为我们还没有起具体的应用,所以这时候页面没有注册进来任何信息。

应用服务Skywalking探针植入

将Skywalking包下的agent包copy到应用示例里(这里就直接给出示例应用demo)
并修改agent/config/agent.config文件

# The service name in UI
agent.service_name=${SW_AGENT_NAME:skyWalking-demo}
# Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}
# Logging file_name
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}
# Logging level
logging.level=${SW_LOGGING_LEVEL:DEBUG}
# Mount the specific folders of the plugins. Plugins in mounted folders would work.
plugin.mount=${SW_MOUNT_FOLDERS:plugins,activations}

应用示例目录结构

windows上搭建Skywalking+Elasticsearch的一次实践

配置文件

server:
  port: 8500

spring:
  swagger:
    enabled: true
    title: elasticsearch-study\u7CFB\u7EDF
    description: skywalking-demo\u7CFB\u7EDF
    version: v1.0
    host: http://localhost:8500/swagger-ui.html
    terms-of-service-url: http://qrainly.top/
    contact:
      name: bj
  auto:
    openurl: true
  web:
    loginurl: http://localhost:8500/swagger-ui.html
    googleexcute: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe
  elasticsearch:
  rest:
    uris: localhost:9200
    connection-timeout: 10s
    #username:
    #password:
logging:
  level:
    org.springframework.data.elasticsearch.core: debug

# -javaagent:D:\v_liuwen\code\skywalking-demo\agent\agent\skywalking-agent.jar

其他代码会在后面贴出

在项目启动类上添加探针植入参数

-javaagent:D:\v_liuwen\code\skywalking-demo\agent\skywalking-agent.jar

windows上搭建Skywalking+Elasticsearch的一次实践
本地启动两个服务示例 一个端口8500 另一个8501
windows上搭建Skywalking+Elasticsearch的一次实践

windows上搭建Skywalking+Elasticsearch的一次实践
点击多次【查询所有数据】接口后,观察Skywalking可视化页面
windows上搭建Skywalking+Elasticsearch的一次实践
可以看到已经注册上Skywalking了。

仪表盘

windows上搭建Skywalking+Elasticsearch的一次实践

拓扑图

windows上搭建Skywalking+Elasticsearch的一次实践
可以在拓扑图上看到服务之间的依赖关系

追踪

windows上搭建Skywalking+Elasticsearch的一次实践
刚才调用的/all接口的链路过程都展示出来了,可以很直观的分析其链路的情况

性能分析

这个模块需要建个分析任务,就不演示了!

日志

这块因为我本地只起了单服务,没有跨服务调用,所以也没打日志
windows上搭建Skywalking+Elasticsearch的一次实践

告警

告警是需要配置文件的
Skywalking目录下config/alarm-settings.yml

rules:
  # Rule unique name, must be ended with `_rule`.
  service_resp_time_rule:
    metrics-name: service_resp_time
    op: ">"
    threshold: 20
    period: 1
    count: 3
    silence-period: 1
    message: Response time of service {name} is more than 20ms in 3 minutes of last 10 minutes.
  service_sla_rule:
    # Metrics value need to be long, double or int
    metrics-name: service_sla
    op: "<"
    threshold: 8000
    # The length of time to evaluate the metrics
    period: 10
    # How many times after the metrics match the condition, will trigger alarm
    count: 2
    # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
    silence-period: 3
    message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
  service_resp_time_percentile_rule:
    # Metrics value need to be long, double or int
    metrics-name: service_percentile
    op: ">"
    threshold: 1000,1000,1000,1000,1000
    period: 10
    count: 3
    silence-period: 5
    message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
  service_instance_resp_time_rule:
    metrics-name: service_instance_resp_time
    op: ">"
    threshold: 1000
    period: 10
    count: 2
    silence-period: 5
    message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
  database_access_resp_time_rule:
    metrics-name: database_access_resp_time
    threshold: 1000
    op: ">"
    period: 10
    count: 2
    message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
  endpoint_relation_resp_time_rule:
    metrics-name: endpoint_relation_resp_time
    threshold: 1000
    op: ">"
    period: 10
    count: 2
    message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes

webhooks:
  - http://localhost:8031/skywalking/alarm/pushData

故意在调接口断点延时
windows上搭建Skywalking+Elasticsearch的一次实践
还可以配置把报警直接推到钉钉等其他平台

本次实践就到这里,后续有新玩法再跟大家分享

参考资料
https://www.fangzhipeng.com/architecture/2020/06/12/skywalking-test.html
https://www.jianshu.com/p/055e4223d054
持续输出中…

上一篇:day 31死锁和递归锁


下一篇:链路追踪-skywalking