pyQuery
pyQuery 是 jQuery 在 python 中的实现,能够以 jQuery 的语法来操作解析 HTML 文档,十分方便。使用前需要安装,easy_install pyquery 即可,或者 Ubuntu 下
sudo apt-get install python-pyquery |
以下例子:
from pyquery import PyQuery as pyq doc=pyq(url=r‘http://list.taobao.com/browse/cat-0.htm‘) cts=doc(‘.market-cat‘) for i in cts: print ‘====‘,pyq(i).find(‘h4‘).text() ,‘====‘ for j in pyq(i).find(‘.sub‘): print pyq(j).text() , print ‘\n‘ |
--------------- my code --------------------
1
2
3
4
5
6
7
8
|
for
i in
cts:
print
‘-‘ * 10 ,pyq(i).find( ‘h4‘ ).text()
for
j in
pyq(i).find( ‘.subtitle‘ ):
print
pyq(j).text()
print
‘\n‘
for
j in
pyq(i).find( ‘.sublist‘ ):
print
‘\t‘ ,pyq(j).text()
print
‘\n‘
|
------------------------------------------------
You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:
1
2
3
4
5
6
7
8
9
|
>>> from
pyquery import
PyQuery as pq
>>> from
lxml import
etree
>>> import
urllib
>>> d =
pq( "<html></html>" )
>>> d =
pq(etree.fromstring( "<html></html>" ))
>>> d =
pq(url = your_url)
>>> d =
pq(url = your_url,
... opener = lambda
url, * * kw: urlopen(url).read())
>>> d =
pq(filename = path_to_html_file)
|
转换 (Traversing)
支持大部分jQuwey转换方法。这里是一些实例。
- 用字符选择器来进行过滤:
>>> d(‘p‘).filter(‘.hello‘)
[<p#hello.hello>]
- 也可以对单一元素使用 eq 方法:
>>> d(‘p‘).eq(0)
[<p#hello.hello>]
- 用户也可以寻找内嵌元素:
>>> d(‘p‘).find(‘a‘)
[<a>, <a>]
>>> d(‘p‘).eq(1).find(‘a‘)
[<a>]
>>> d(‘p‘).find(‘a‘).end()
[<p#hello.hello>, <p#test>]
>>> d(‘p‘).eq(0).end()
[<p#hello.hello>, <p#test>]
>>> d(‘p‘).filter(lambda i: i == 1).end()
[<p#hello.hello>, <p#test>]