之前在腾讯蓝鲸智云-单机离线部署测试中,遇到了几个安装问题,本文记录下3.2 app_mgr组件安装失败 的解决过程,因为这个问题卡了很久(可能也是因为笔者对python相关知识和蓝鲸产品不够熟悉),虽然最终解决了,但过程本身更值得记录。
1.问题描述
离线安装app_mgr组件时失败:
安装命令:./bk_install app_mgr
报错信息如下:
create virtualenv for paas_agent
Requirement already satisfied: pbr in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenvwrapper in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenv-clone in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: stevedore in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: virtualenv in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: pbr>=1.6 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
[192.168.1.6]20200303-174651 224 mkvirtualenv -a /data/bkce/paas_agent/paas_agent --extra-search-dir=/data/install/pip --no-download -p /usr/local/bin/python paas_agent
Already using interpreter /usr/local/bin/python
New python executable in /data/bkce/.envs/paas_agent/bin/python
Installing setuptools, pip, wheel...done.
Setting project for paas_agent to /data/bkce/paas_agent/paas_agent
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): pbr in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): virtualenvwrapper in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): virtualenv-clone in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): stevedore in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): virtualenv in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): pbr>=1.6 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): six>=1.9.0 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): supervisor in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): meld3>=0.6.5 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from supervisor)
[192.168.1.6]20200303-174801 233 generate env variable settings.
[192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91150>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91d50>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91f10>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c110>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c2d0>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
Could not find a version that satisfies the requirement Django==1.8.11 (from -r requirements.txt (line 1)) (from versions: )
No matching distribution found for Django==1.8.11 (from -r requirements.txt (line 1))
[192.168.1.6]20200303-174900 177 pip install (--no-cache-dir ) for paas_agent. FAILED
[192.168.1.6]20200303-174900 47 Abort
注意:离线安装就是指安装环境无法连接互联网,如果你的部署环境允许可以连接外网,测试过该组件安装会非常顺利。
2.初步分析
首先,比较奇怪的是只有离线安装app_mgr这个组件时,报错无法连接网络,回顾上面的报错日志,发现安装这个组件时:
[192.168.1.6]20200303-174801 233 generate env variable settings.
[192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
看起来这个pip 命令没有使用--find-links
参数指定本地的路径,所以尝试连接外网的pip源。
而在其他组件安装时,都是有指定这个参数到各自本地路径的:
--比如安装fta:
[192.168.1.6]20200302-001610 233 generate env variable settings.
[192.168.1.6]20200302-001610 151 exec: pip install --no-cache-dir --no-index --find-links=/data/src/fta/support-files/pkgs -r requirements.txt (/data/bkce/fta/fta)
--比如安装bkdata
[192.168.1.6]20200302-003237 233 generate env variable settings.
[192.168.1.6]20200302-003237 151 exec: pip install --no-cache-dir --no-index --find-links=/data/src/bkdata/support-files/pkgs -r requirements.txt (/data/bkce/bkdata/dataapi)
可以看到这类组件安装在同样类似的步骤时,都有使用--find-links
参数各自指定本地包存放的路径。
初步进行了一些尝试:
2.1 直接使用pip离线安装后再次尝试单独安装app_mgr
pip install --no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs -r /data/bkce/paas_agent/paas_agent/requirements.txt
pip离线安装成功,但是再调用安装./bk_install app_mgr 组件依然报错,说明手工提前安装无效。
这大概是因为程序是进入到对应的virtualenv执行的,而虚拟环境相对是独立的。
2.2 找到一些pip.conf的配置文件,备份原文件,修改配置指定本地路径
尝试修过的配置文件:/data/src/.pip/pip.conf、/data/install/pip/pip.conf,内容改为:
[global]
find-links = /data/src/paas_agent/support-files/pkgs
[install]
find-links = /data/src/paas_agent/support-files/pkgs
但是调用安装./bk_install app_mgr 组件依然报同样错误,说明无效。
后面其他尝试会发现有更多的pip.conf,全部修改也是不行。
2.3 设置环境变量
官方文档搜到一个环境变量PIP_FIND_LINKS:
export PIP_FIND_LINKS=/data/src/paas_agent/support-files/pkgs
再次尝试调用./bk_install app_mgr
安装组件,报错不变。
这大概是因为写死在程序里的,类似crontab定时任务一样,在外部设置变量干预也没用,必须找到里面的设置。
2.4 其他尝试
比如在bk_install中app_mgr模块下手工加入上面的环境变量设置,也不行,报错不变。
3.集思广益
问题有些陷入僵局,而且显然是有问题,与客户反馈上述分析,一致认为很可能是bug,找蓝鲸客服进行反馈。
客服人员的答复是离线安装建议配置完整的本地pip源,考虑到全量pip源要接近2T的空间申请,转换为进行指定包的pip源搭建。
而且这个解决方案更像是workaround,跳过了问题本质,因为实际其他组件都不需要,会使用find-links参数指定本地的包目录。
因为之前没接触过,配置本地pip源也耗费了不少时间搜索验证:
[root@rbtnode1 bin]# find /data -name pip.conf
/data/install/pip/pip.conf
/data/install/pip.conf
/data/src/service/.pip/pip.conf
/data/src/.pip/pip.conf
/data/src/pip.conf
cat /data/install/pip/pip.conf
cat /data/install/pip.conf
cat /data/src/service/.pip/pip.conf
cat /data/src/.pip/pip.conf
cat /data/src/pip.conf
cat ~/.pip/pip.conf
不清楚究竟会用到哪个pip.conf,所以所有配置文件备份,然后内容统一都改为本地pip源:
[global]
trusted-host = 192.168.1.6
index-url = http://192.168.1.6:8080/simple
关于本地pip源的具体配置,可参考网上这两篇文章:
但是尝试安装还是报错。修改globals.env配置文件:
# 设置访问网络资源如yum源所使用的HTTP代理地址, 如: BK_PROXY=http://192.168.0.1:8833
export BK_PROXY=http://192.168.1.6:8080/simple
和同事也聊到这个事情,从逻辑上来看还是应该解决如何跟其他组件一样可以指定find-links参数才可以。
思路只能是自己从脚本源头去找,看有没有对应的设置。从bk_install这个主脚本开始为入口。
4.最终解决
开始看脚本没多久就看下去了,因为自己很少运用脚本能力,本身也是弱项。从bk_install到bkcec就看到里面调用了好多文件,一时找不到头绪。此时又回头看最初的报错日志,看报错之前有这样一行,像是脚本的输出内容:
[192.168.1.6]20200303-174801 233 generate env variable settings.
[192.168.1.6]20200303-174801 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
依据"generate env variable settings"搜索/data/install下所有的文件,发现只有utils.fc文件包含:
[root@rbtnode1 install]# grep "generate env variable settings" *
grep: agent_setup: Is a directory
grep: appmgr: Is a directory
grep: bcs: Is a directory
grep: bin: Is a directory
grep: build: Is a directory
grep: deck: Is a directory
grep: extra: Is a directory
grep: health_check: Is a directory
grep: migrate: Is a directory
grep: pip: Is a directory
grep: scripts: Is a directory
grep: setuptools-36.0.1: Is a directory
grep: support-files: Is a directory
grep: templates: Is a directory
grep: uninstall: Is a directory
utils.fc: log "generate env variable settings."
grep: verify: Is a directory
[root@rbtnode1 install]# ls -l utils.fc
-rw-r--r-- 1 root root 38897 Jan 9 16:11 utils.fc
[root@rbtnode1 install]# scp utils.fc 192.168.1.61:/tmp/
拷贝下来去看发现有这样一段代码比较像:
_install_pypkgs () {
local module=$1
local project=$2
local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
local pip_options="--no-cache-dir "
local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )
if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
_ordered_requirement_files=( requirements.txt )
fi
for reqr_file in ${_ordered_requirement_files[@]}; do
if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
fi
log "exec: pip install $pip_options -r $reqr_file ($PWD)"
http_proxy=$BK_PROXY https_proxy=$BK_PROXY pip install $pip_options -r $reqr_file <-- 这里pip install 带的参数$pip_options很可能没有find-links参数
nassert "pip install ($pip_options) for $venv_name"
done
#shopt -s nullglob
}
上面标注的那一行,指出这里pip install 带的参数$pip_options很可能没有find-links参数,因为上面赋予pip_options变量的是在if条件里面,暂时来不及整体梳理分析,尝试直接修改 utils.fc 文件加入pip_options的定义:
_install_pypkgs () {
local module=$1
local project=$2
local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
local pip_options="--no-cache-dir "
local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )
if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
_ordered_requirement_files=( requirements.txt )
fi
for reqr_file in ${_ordered_requirement_files[@]}; do
if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
fi
log "exec: pip install $pip_options -r $reqr_file ($PWD)"
http_proxy=$BK_PROXY https_proxy=$BK_PROXY #pip install $pip_options -r $reqr_file <-- 之前的这一行注释,下面两行是新增,指定pip_options参数值后再调用pip install
pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
pip install $pip_options -r $reqr_file
nassert "pip install ($pip_options) for $venv_name"
done
#shopt -s nullglob
}
修改 utils.fc 后再次测试,发现之前报错的位置不再报错(虽然显示还没有find-links参数,但实际已经有了):
[192.168.1.6]20200303-214725 235 generate env variable settings.
[192.168.1.6]20200303-214726 151 exec: pip install --no-cache-dir -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Ignoring indexes: http://192.168.1.6:8080/simple
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
Collecting PyMySQL==0.6.7 (from -r requirements.txt (line 2))
省略部分输出..
Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
Could not find a version that satisfies the requirement idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) (from versions: )
No matching distribution found for idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
[192.168.1.6]20200303-214856 177 pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent. FAILED
[192.168.1.6]20200303-214856 47 Abort
[root@rbtnode1 install]#
但最后又因为缺包中止了安装。
这个 idna<2.9,>=2.5 在paas_agent的requirements.txt中实际没有列出来,但实际需要。可以将其他位置的包都统一打包到一个目录(/data/localpip),然后拷贝其他的包到这个目录下:
[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
62
[root@rbtnode1 pkgs]# cp -n /data/localpip/* ./
[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
281
然后再尝试安装app_mgr:
[root@rbtnode1 pkgs]# cd /data/install/
[root@rbtnode1 install]# ./bk_install app_mgr
这次终于成功了,日志如下,可以看到appt安装成功后接下来还是安装appo,都可以成功:
Collecting chardet<3.1.0,>=3.0.2 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting certifi>=2017.4.17 (from requests==2.21.0->-r requirements.txt (line 3))
Installing collected packages: Django, PyMySQL, urllib3, chardet, idna, certifi, requests, pytz, amqp, anyjson, kombu, billiard, celery, django-celery, redis, httplib2, xlrd, xlwt, MarkupSafe, Mako, Jinja2, pycrypto, gunicorn, six, SQLAlchemy, suds, supervisor, uWSGI, pytest-runner, setuptools-scm
Running setup.py install for anyjson: started
Running setup.py install for anyjson: finished with status 'done'
Running setup.py install for billiard: started
Running setup.py install for billiard: finished with status 'done'
省略部分输出..
Successfully installed Django-1.8.11 Jinja2-2.8 Mako-1.0.4 MarkupSafe-0.23 PyMySQL-0.6.7 SQLAlchemy-1.0.12 amqp-1.4.9 anyjson-0.3.3 billiard-3.3.0.23 celery-3.1.18 certifi-2019.3.9 chardet-3.0.4 django-celery-3.2.1 gunicorn-19.6.0 httplib2-0.9.1 idna-2.8 kombu-3.0.35 pycrypto-2.6.1 pytest-runner-2.8 pytz-2016.6.1 redis-2.10.5 requests-2.21.0 setuptools-scm-1.11.1 six-1.10.0 suds-0.4 supervisor-3.3.1 uWSGI-2.0.13.1 urllib3-1.24.1 xlrd-1.0.0 xlwt-1.1.2
[192.168.1.6]20200303-222848 175 pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent. OK
[192.168.1.6]20200303-222858 453 apps isolate mode: virutalenv
Ignoring indexes: http://192.168.1.6:8080/simple
Requirement already satisfied (use --upgrade to upgrade): Django==1.8.11 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): PyMySQL==0.6.7 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 2))
省略部分输出..
[192.168.1.6]20200303-222926 151 install python package for virtualenv paas_agent done.
[192.168.1.6]20200303-222927 468 local nginx is required for paas_agent. going to install it.
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Package 1:nginx-1.12.2-2.el7.x86_64 already installed and latest version
Nothing to do
[192.168.1.6]20200303-222934 175 render: #etc#nginx.conf -> /data/bkce//etc/nginx.conf. OK
[192.168.1.6]20200303-222935 175 render: #etc#nginx#paasagent.conf -> /data/bkce//etc/nginx/paasagent.conf. OK
[192.168.1.6]20200303-222936 322 PLACE HOLDER __SID__ is replaced into empty
[192.168.1.6]20200303-222937 322 PLACE HOLDER __TOKEN__ is replaced into empty
[192.168.1.6]20200303-222937 175 render: #etc#paas_agent_config.yaml.tpl -> /data/bkce//etc/paas_agent_config.yaml. OK
[192.168.1.6]20200303-222938 175 render: #etc#supervisor-paas_agent.conf -> /data/bkce//etc/supervisor-paas_agent.conf. OK
[192.168.1.6]20200303-222939 56 install appt(allproject) done
initdata for appt()
[192.168.1.6]20200303-222946 182 exec initdata_appt on 192.168.1.6
[192.168.1.6]20200303-222958 262 update config file: paas_agent_config.yaml
[192.168.1.6]20200303-222958 268 register appt succeded.
[192.168.1.6]20200303-222958 502 create database bksuite_common
[192.168.1.6]20200303-222958 504 add version info to db
[192.168.1.6]20200303-223001 98 starting appt(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223052 77 activate appt(192.168.1.6) succeded
#这里appt已经安装成功,接下来安装appo
省略部分输出..
install appo(all)
[192.168.1.6]20200303-223102 112 check dependences for paas_agent
省略部分输出..
initdata for appo()
[192.168.1.6]20200303-223509 182 exec initdata_appo on 192.168.1.6
[192.168.1.6]20200303-223533 262 update config file: paas_agent_config.yaml
[192.168.1.6]20200303-223534 268 register appo succeded.
[192.168.1.6]20200303-223535 502 create database bksuite_common
[192.168.1.6]20200303-223535 504 add version info to db
[192.168.1.6]20200303-223541 98 starting appo(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223613 77 activate appo(192.168.1.6) succeded
[192.168.1.6] paas_agent() paas_agent RUNNING pid 23792, uptime 0:06:10
[192.168.1.6] nginx: RUNNING
[192.168.1.6] paas_agent() paas_agent RUNNING pid 23792, uptime 0:06:42
[192.168.1.6] nginx: RUNNING
[192.168.1.6] rabbitmq: RUNNING
如果以上步骤没有报错, 你现在可以完成正式环境及测试环境的部署,可以:
1. 通过./bk_install saas-o bk_nodeman 部署节点管理app, 或
2. 通过开发者中心部署app.
若要安装蓝鲸监控, 日志检索, 需要先通过 ./bk_install bkdata 安装 bkdata
[root@rbtnode1 install]#
终于跌跌撞撞的解决了这个困惑许久的问题。后续自己还需要加强python和shell的脚本能力。