使用Python从HTML表中提取数据

我想使用Python脚本从HTML表中提取数据,并将其保存为变量(以后我可以在将它们存在后将它们加载到同一脚本中)保存到单独的文件中.此外,我希望脚本忽略表的第一行(组件,状态,时间/错误).我宁愿不使用外部库.

输出到新文件应该是这样的:

SAVE_DOCUMENT_STATUS = "OK"
SAVE_DOCUMENT_TIME = "0.408"
GET_DOCUMENT_STATUS = "OK"
GET_DOCUMENT_TIME = "0.361"
...

并且继承了脚本的输入:

<table border=1>
<tr>
<td><b>Component</b></td>
<td><b>Status</b></td>
<td><b>Time / Error</b></td>
</tr>
<tr><td>SAVE_DOCUMENT</td><td>OK</td><td>0.408 s</td></tr>
<tr><td>GET_DOCUMENT</td><td>OK</td><td>0.361 s</td></tr>
<tr><td>DVK_SEND</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>DVK_RECEIVE</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>GET_USER_INFO</td><td>OK</td><td>0.135 s</td></tr>
<tr><td>NOTIFICATIONS</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>ERROR_LOG</td><td>OK</td><td>0.001 s</td></tr>
<tr><td>SUMMARY_STATUS</td><td>OK</td><td>0.913 s</td></tr>
</table>

我尝试在bash中做到这一点,但由于我需要将* _TIME变量与最大时间进行比较,然后失败,因为它们是浮点数.

解决方法:

使用lxml

import lxml.html as lh

content='''\
<table border=1>
<tr>
<td><b>Component</b></td>
<td><b>Status</b></td>
<td><b>Time / Error</b></td>
</tr>
<tr><td>SAVE_DOCUMENT</td><td>OK</td><td>0.408 s</td></tr>
<tr><td>GET_DOCUMENT</td><td>OK</td><td>0.361 s</td></tr>
<tr><td>DVK_SEND</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>DVK_RECEIVE</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>GET_USER_INFO</td><td>OK</td><td>0.135 s</td></tr>
<tr><td>NOTIFICATIONS</td><td>OK</td><td>0.002 s</td></tr>
<tr><td>ERROR_LOG</td><td>OK</td><td>0.001 s</td></tr>
<tr><td>SUMMARY_STATUS</td><td>OK</td><td>0.913 s</td></tr>
</table>
'''
tree=lh.fromstring(content)
for key, status, t in zip(*[iter(tree.xpath('//td/text()'))]*3):
    print('''{k}_STATUS = "{s}"
{k}_TIME = "{t}"'''.format(k=key,s=status,t=t.rstrip(' s')))

产量

SAVE_DOCUMENT_STATUS = "OK"
SAVE_DOCUMENT_TIME = "0.408"
GET_DOCUMENT_STATUS = "OK"
GET_DOCUMENT_TIME = "0.361"
DVK_SEND_STATUS = "OK"
DVK_SEND_TIME = "0.002"
DVK_RECEIVE_STATUS = "OK"
DVK_RECEIVE_TIME = "0.002"
GET_USER_INFO_STATUS = "OK"
GET_USER_INFO_TIME = "0.135"
NOTIFICATIONS_STATUS = "OK"
NOTIFICATIONS_TIME = "0.002"
ERROR_LOG_STATUS = "OK"
ERROR_LOG_TIME = "0.001"
SUMMARY_STATUS_STATUS = "OK"
SUMMARY_STATUS_TIME = "0.913"
上一篇:vue打包后,轮播图的动画内容位置显示不正确,且动画失效


下一篇:OGG 丢失归档恢复