标题有点夸张,我最终的目的其实是:用 Python 从指定网页下载 centos7.6.1810 所有 src.rpm 源码包(我找过 centos 的镜像,都无一例外没有源码包目录,感觉很不人性化,而且网页上的源码文件也没有统一在一个目录下,手动下载那么多源码包好像也不太现实。不像 openEuler 至少还有个 https://repo.openeuler.org/openEuler-20.03-LTS-SP1/ISO/source/ 源码包镜像地址的链接)。
这也是事情的起因。哈哈,这个主题可能不具有普适性(我应该也打上一个 自家用 的标签才好 ^_^),不过刚好是我最近遇到的其中一个问题,于是乎索性写一写罢了。
我能得到的起始的 url 地址是这个:http://mirror.nsc.liu.se/centos-store/7.6.1810/ 。用浏览器打开是下面这丫的样子(可以看到左下角有个 lighttpd 的字样,所以想来应该和 nginx 索引目录列表提供用户下载文件的原理差不多)。
用 lftp http://mirror.nsc.liu.se/centos-store/7.6.1810/ 命令好像也可以进去(不过这命令用得不多,很尴尬,后面再来研究一下它,说不定有奇效)。
[root@localhost ttt]# lftp http://mirror.nsc.liu.se/centos-store/7.6.1810/
cd ok, cwd=/centos-store/7.6.1810
lftp mirror.nsc.liu.se:/centos-store/7.6.1810> ls
drwxr-xr-x -- ..
drwxr-xr-x 2018-11-29 00:58 atomic
drwxr-xr-x 2018-11-29 16:54 centosplus
drwxr-xr-x 2018-11-28 23:59 cloud
drwxr-xr-x 2018-11-29 00:59 configmanagement
drwxr-xr-x 2018-12-02 15:34 cr
drwxr-xr-x 2017-09-29 14:33 dotnet
drwxr-xr-x 2018-11-29 16:55 extras
drwxr-xr-x 2017-09-01 13:08 fasttrack
drwxr-xr-x 2018-11-27 09:05 isos
drwxr-xr-x 2018-11-29 00:59 nfv
drwxr-xr-x 2018-11-29 00:59 opstools
drwxr-xr-x 2018-12-10 22:51 os
drwxr-xr-x 2018-11-29 00:58 paas
drwxr-xr-x 2017-02-10 22:18 rt
drwxr-xr-x 2018-11-29 00:56 sclo
drwxr-xr-x 2018-11-29 00:58 storage
drwxr-xr-x 2018-11-29 16:57 updates
drwxr-xr-x 2018-11-29 00:58 virt
所以从这个目录开始,我开始递归往下寻找并下载我需要的 *.src.rpm 源码包文件(效率不是太高,深感惭愧)。
import os
import re
import requests
def load_url_data(url):
"""
从url页面中提取并下载 src.rpm 源码包
"""
r = requests.get(url)
raw_list = re.compile(r'<a.*?>(.*?)</a>').finditer(r.text.strip())
for i in raw_list:
x = i.group(1)
if x.endswith('.src.rpm'):
# src_rpm = os.path.join(url, x)
# 没使用 os.path.join 是因为在 Windows 环境下拼接的路径有问题
src_rpm = '/'.join([url, x])
print(src_rpm)
if not os.path.exists(x):
os.system('wget %s' % src_rpm)
else:
print('already downloaded %s' % x)
elif '.' in x or 'x86_64' in x:
# 由于对所有除了 .src.rpm 的其他文件我都不关心,所以直接略过
# x86_64 这个目录主要是放二进制包,我不太需要,所以碰到以后直接略过
pass
else:
sub_url = '/'.join([url, x])
print(f'scanning {sub_url} ...')
load_url_data(sub_url)
if __name__ == '__main__':
# centos_url = 'https://vault.centos.org/7.6.1810/'
centos_url = 'http://mirror.nsc.liu.se/centos-store/7.6.1810/'
load_url_data(centos_url)
输出类似于下面这种(单个下载还容易卡住,可能和网速也有一些关系):
[root@localhost centos7.1810_src_packages]# python3 test.py
scanning http://mirror.nsc.liu.se/centos-store/7.6.1810//atomic ...
scanning http://mirror.nsc.liu.se/centos-store/7.6.1810//atomic/Source ...
scanning http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus ...
scanning http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source ...
scanning http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source/SPackages ...
http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source/SPackages/kernel-plus-3.10.0-957.1.3.el7.centos.plus.src.rpm
--2021-03-30 16:52:16-- http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source/SPackages/kernel-plus-3.10.0-957.1.3.el7.centos.plus.src.rpm
Resolving mirror.nsc.liu.se (mirror.nsc.liu.se)... 130.236.101.92, 2001:6b0:17:2::1:92
Connecting to mirror.nsc.liu.se (mirror.nsc.liu.se)|130.236.101.92|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100898069 (96M) [application/x-rpm]
Saving to: ‘kernel-plus-3.10.0-957.1.3.el7.centos.plus.src.rpm’
kernel-plus-3.10.0-957.1.3.el7.centos.pl 100%[==================================================================================>] 96.22M 5.23MB/s in 21s
2021-03-30 16:52:37 (4.66 MB/s) - ‘kernel-plus-3.10.0-957.1.3.el7.centos.plus.src.rpm’ saved [100898069/100898069]
http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source/SPackages/kernel-plus-3.10.0-957.10.1.el7.centos.plus.src.rpm
--2021-03-30 16:52:37-- http://mirror.nsc.liu.se/centos-store/7.6.1810//centosplus/Source/SPackages/kernel-plus-3.10.0-957.10.1.el7.centos.plus.src.rpm
Resolving mirror.nsc.liu.se (mirror.nsc.liu.se)... 130.236.101.92, 2001:6b0:17:2::1:92
Connecting to mirror.nsc.liu.se (mirror.nsc.liu.se)|130.236.101.92|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100922887 (96M) [application/x-rpm]
Saving to: ‘kernel-plus-3.10.0-957.10.1.el7.centos.plus.src.rpm’
kernel-plus-3.10.0-957.10.1.el7.centos.p 100%[==================================================================================>] 96.25M 8.52MB/s in 34s
2021-03-30 16:53:12 (2.82 MB/s) - ‘kernel-plus-3.10.0-957.10.1.el7.centos.plus.src.rpm’ saved [100922887/100922887]
...
部分下载结果(还没全部下载完的):
[root@localhost centos7.1810_src_packages]# ll
total 998M
-rw-r--r--. 1 root root 4.1M Sep 1 2017 ansible-2.3.0.0-3.el7.src.rpm
-rw-r--r--. 1 root root 274K Feb 23 2017 apiextractor-0.10.10-11.el7.src.rpm
-rw-r--r--. 1 root root 6.6M Feb 23 2017 babel-2.3.4-1.el7.src.rpm
-rw-r--r--. 1 root root 764K Feb 23 2017 bakefile-0.2.9-2.el7.src.rpm
-rw-r--r--. 1 root root 72K Feb 23 2017 bandit-0.13.2-1.el7.src.rpm
-rw-r--r--. 1 root root 615K Sep 1 2017 blosc-1.11.1-3.el7.src.rpm
-rw-r--r--. 1 root root 68M Feb 23 2017 boost159-1.59.0-2.el7.src.rpm
-rw-r--r--. 1 root root 1.4M Feb 23 2017 coin-or-Cbc-2.9.8-1.el7.src.rpm
-rw-r--r--. 1 root root 953K Feb 23 2017 coin-or-Cgl-0.59.9-1.el7.src.rpm
-rw-r--r--. 1 root root 1.9M Feb 23 2017 coin-or-Clp-1.16.10-1.el7.src.rpm
-rw-r--r--. 1 root root 965K Feb 23 2017 coin-or-CoinUtils-2.10.13-1.el7.src.rpm
-rw-r--r--. 1 root root 736K Feb 23 2017 coin-or-Osi-0.107.8-1.el7.src.rpm
-rw-r--r--. 1 root root 350K Feb 23 2017 coin-or-Sample-1.2.10-5.el7.src.rpm
-rw-r--r--. 1 root root 476K Feb 23 2017 conntrack-tools-1.4.2-3.el7.src.rpm
...