python-使用XPath 1.0提取文本与正则表达式匹配的URL

2023-08-22 23:12:28

我想使用Scrapy中的XPath提取这种类型的URL(链接文本是具有任意数字位数的数字,href是随机文本).

>< a href =“ http://www.example.com/link_to_some_page.html\u0026gt;3\u0026lt;/a\u0026gt;
>< a href =“ http://www.example.com/another_link-abcd.html\u0026gt;45\u0026lt;/a\u0026gt;
我可以想到类似

HtmlXPathSelector(response).select('//a[matches(text(),"\d+")]/@href')

但是,似乎不支持XPath 2.0,并且我不能使用正则表达式.

我可以搜索的最佳单行解决方案是来自以下问题：xpath expression for regex-like matching?-是否有更好的解决方案来实现这一目标？

解决方法:

.select('//a[. != "" and translate(., "0123456789", "") = ""]/@href')

码农公寓

相关文章