ES集群在线滚动升级方案实践

ES集群在线滚动升级方案实践
Upgrading From 7.7.0 to 7.10.0 for elasticsearch and Opendistro
shasum -a 512 -c elasticsearch-7.10.0-x86_64.rpm.sha512
elasticsearch-7.10.0-x86_64.rpm: OK

参考:
https://www.elastic.co/guide/en/elasticsearch/reference/7.10/rolling-upgrades.html
https://opendistro.github.io/for-elasticsearch-docs/docs/upgrade/rolling/

opendistro目标升级版本列表
opendistro-anomaly-detection    1.12.0.0
opendistro-job-scheduler        1.12.0.0
opendistro-knn                  1.12.0.0 #不需要
opendistro_alerting             1.12.0.2
opendistro_index_management     1.12.0.1
opendistro_performance_analyzer 1.12.0.0 #不需要
opendistro_security             1.12.0.0 #不需要
opendistro_sql                  1.12.0.0
opendistro-reports-scheduler    1.12.0.0 #不需要

#==========准备==============

1、查看服务器节点不被选举为master节点列表
curl http://10.21.42.90:9200/_nodes/_all,master:false| jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    91  100    91    0     0  23125      0 --:--:-- --:--:-- --:--:-- 30333
{
  "_nodes": {
    "total": 0,
    "successful": 0,
    "failed": 0
  },
  "cluster_name": "YooZoo-Ops-ELK",
  "nodes": {}
}

2、查看可被发现为主节点的节点列表
curl http://10.21.42.90:9200/_nodes/master:true

注意事项:
强烈建议您在升级时将群集的节点分为以下两个组,并按此顺序升级组:
不符合master资格的节点。您可以使用GET /_nodes/_all,master:false或通过查找所有使用配置的节点来检索这些节点的列表node.master: false。
符合主机资格的节点,即其余节点。您可以使用检索这些节点的列表GET /_nodes/master:true。
您可以按任何顺序升级每个组中的节点。

按此顺序升级节点可确保符合资格的主节点始终运行至少与符合资格的主节点一样新的版本。较新的节点始终可以与较旧的主节点一起加入集群,但是较旧的节点
不能始终与较新的主节点一起加入集群。通过最后一次升级符合资格的主节点,可以确保无论符合资格的主节点是否已升级,所有符合资格的主节点都将能够加入群集。
如果您在不符合主机资格的节点之前升级了任何符合主机资格的节点,则存在较旧的节点将离开集群并且在升级之前无法重新加入的风险



3、节点状态查看
curl http://10.21.42.92:9200/_cat/nodes?v
curl http://10.21.42.92:9200/_cat/nodes?v
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.21.42.90           63          41   0    0.09    0.10     0.15 dilmrt    *      node-001
10.21.42.91           25          45   0    0.01    0.07     0.13 dilmrt    -      node-002
10.21.42.92           46          44   0    0.28    0.21     0.17 dilmrt    -      node-003

#======操作步骤==================
1、禁用分片分配
curl -X PUT "10.21.42.90:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
'
集群健康状态查看

curl -X GET "10.21.42.91:9200/_cat/health?v=true&pretty"

2、停止不必要的索引并执行同步刷新
curl -X POST "10.21.42.90:9200/_flush/synced?pretty"



3、暂时停止与活动的机器学习作业和数据馈送相关的任务
curl -X POST "10.21.42.90:9200/_ml/set_upgrade_mode?enabled=true&pretty"


4、停止es节点
systemctl stop elasticsearch.service


5、升级节点
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.0-x86_64.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.0-x86_64.rpm.sha512
shasum -a 512 -c elasticsearch-7.10.0-x86_64.rpm.sha512
yum  install elasticsearch-7.10.0-x86_64.rpm


6、升级插件
6.1删除老插件
/usr/share/elasticsearch/bin/elasticsearch-plugin  list
opendistro-anomaly-detection
opendistro-job-scheduler
opendistro_alerting
opendistro_index_management
opendistro_sql


/usr/share/elasticsearch/bin/elasticsearch-plugin  remove  opendistro-anomaly-detection
/usr/share/elasticsearch/bin/elasticsearch-plugin  remove  opendistro-job-scheduler
/usr/share/elasticsearch/bin/elasticsearch-plugin  remove  opendistro_alerting
/usr/share/elasticsearch/bin/elasticsearch-plugin  remove  opendistro_index_management
/usr/share/elasticsearch/bin/elasticsearch-plugin  remove  opendistro_sql



6.2 install插件
es节点,注意以下安装按照顺序执行
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-job-scheduler/opendistro-job-scheduler-1.12.0.0.zip
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-alerting/opendistro_alerting-1.12.0.2.zip
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-sql/opendistro_sql-1.12.0.0.zip
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-index-management/opendistro_index_management-1.12.0.1.zip
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-anomaly-detection/opendistro-anomaly-detection-1.12.0.0.zip


kibana节点
/usr/share/kibana/bin/kibana-plugin  install  https://d3g5vo6xdbdb9a.cloudfront.net/downloads/kibana-plugins/opendistro-index-management/opendistro_index_management_kibana-1.12.0.0.zip
/usr/share/kibana/bin/kibana-plugin  install  https://d3g5vo6xdbdb9a.cloudfront.net/downloads/kibana-plugins/opendistro-alerting/opendistro-alerting-1.12.0.0.zip



7、安全xpack设置,本集群未开启xpack,不需要此操作

8、启动升级后的节点
systemctl start elasticsearch.service

9、重新启用分片分配

curl -X PUT "10.21.42.90:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

遇到问题,集群处于警告状态,发现监控索引不存在,需要删除掉.triggered_watches-* and .watches_history-* 索引并重启watcher,集群状态恢复为green
问题解决参考方案:https://discuss.elastic.co/t/how-to-create--watcher-index/72523/5
查看索引状态
curl -X GET "10.21.42.91:9200/_cat/indices/.triggered-watches"| jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   437  100   437    0     0  69200      0 --:--:-- --:--:-- --:--:-- 87400
{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [.triggered-watches]",
        "index_uuid": "_na_",
        "resource.type": "index_or_alias",
        "resource.id": ".triggered-watches",
        "index": ".triggered-watches"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index [.triggered-watches]",
    "index_uuid": "_na_",
    "resource.type": "index_or_alias",
    "resource.id": ".triggered-watches",
    "index": ".triggered-watches"
  },
  "status": 404
}

删除索引.watcher-history
curl -XDELETE   -m 100  http://10.21.42.90:9200/.watcher-history-12-2021.02.03?master_timeout=2m



elastichsearch watcher 接口重启命令
curl -X POST "10.21.42.91:9200/_watcher/_stop"
curl -X POST "10.21.42.91:9200/_watcher/_start"
curl -X GET "10.21.42.91:9200/_watcher/stats"


curl -X GET  "10.21.42.91:9200/_watcher/stats"| jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   516  100   516    0     0   108k      0 --:--:-- --:--:-- --:--:--  125k
{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "YooZoo-Ops-ELK",
  "manually_stopped": false,
  "stats": [
    {
      "node_id": "74rkM83YSPiHPx5-SvN47g",
      "watcher_state": "started",
      "watch_count": 3,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 50
      }
    },
    {
      "node_id": "eM2WbM8zSFOAUi6tapgk7Q",
      "watcher_state": "started",
      "watch_count": 3,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 50
      }
    },
    {
      "node_id": "MLd6ANlTT8uDpr1uhukRbg",
      "watcher_state": "started",
      "watch_count": 0,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 0
      }
    }
  ]
}




10、等待节点恢复
curl -X GET "10.21.42.91:9200/_cat/health?v=true&pretty"
curl -X GET "10.21.42.91:9200/_cat/recovery?pretty"


11、待节点状态都同步为green时,可在其他节点重复以上操作
curl -X GET "10.21.42.91:9200/_cat/indices/.triggered-watches"
curl -X GET "10.21.42.91:9200/_cat/health?v=true&pretty"


查看当前集群节点列表和版本
curl -X GET "10.21.42.90:9200/_cat/nodes?h=ip,name,version&v=true&pretty"
ip          name     version
10.21.42.92 node-003 7.7.0
10.21.42.91 node-002 7.10.0
10.21.42.90 node-001 7.7.0


12、启动机器学习任务,本集群未开启该功能,可不与操作
curl -X POST "localhost:9200/_ml/set_upgrade_mode?enabled=false&pretty"


13、查看当前集群节点列表和版本
curl -X GET "10.21.42.90:9200/_cat/nodes?h=ip,name,version&v=true&pretty"
ip          name     version
10.21.42.90 node-001 7.10.0
10.21.42.92 node-003 7.10.0
10.21.42.91 node-002 7.10.0

14、升级kibana
systemctl  stop kibana
yum  install kibana-7.10.0-x86_64.rpm
systemctl  start kibana


上一篇:ES学习总结


下一篇:网卡抓包tcpdump