【py分析】

2022-10-04 21:52:24

pyQuery

pyQuery 是 jQuery 在 python 中的实现，能够以 jQuery 的语法来操作解析 HTML 文档，十分方便。使用前需要安装，easy_install pyquery 即可，或者 Ubuntu 下

sudo apt-get install python-pyquery

以下例子：

from pyquery import PyQuery as pyq

doc=pyq(url=r'http://list.taobao.com/browse/cat-0.htm')

cts=doc('.market-cat')

 

for i in cts:

	print '====',pyq(i).find('h4').text() ,'===='

	for j in pyq(i).find('.sub'):

		print pyq(j).text() ,

	print '\n'

--------------- my code --------------------

for i in cts:

    print '-'*10,pyq(i).find('h4').text()

    for j in pyq(i).find('.subtitle'):

        print pyq(j).text()

    print '\n'

    for j in pyq(i).find('.sublist'):

        print '\t',pyq(j).text()

    print '\n'

------------------------------------------------

You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:

>>> from pyquery import PyQuery as pq

>>> from lxml import etree

>>> import urllib

>>> d = pq("<html></html>")

>>> d = pq(etree.fromstring("<html></html>"))

>>> d = pq(url=your_url)

>>> d = pq(url=your_url,

...        opener=lambda url, **kw: urlopen(url).read())

>>> d = pq(filename=path_to_html_file)

转换 (Traversing)

支持大部分jQuwey转换方法。这里是一些实例。

用字符选择器来进行过滤:

>>> d('p').filter('.hello')

[<p#hello.hello>]

也可以对单一元素使用 eq 方法:

>>> d('p').eq(0)

[<p#hello.hello>]

用户也可以寻找内嵌元素:

>>> d('p').find('a')

[<a>, <a>]

>>> d('p').eq(1).find('a')

[<a>]

>>> d('p').find('a').end()

[<p#hello.hello>, <p#test>]

>>> d('p').eq(0).end()

[<p#hello.hello>, <p#test>]

>>> d('p').filter(lambda i: i == 1).end()

[<p#hello.hello>, <p#test>]

码农公寓

pyQuery

转换 (Traversing)

相关文章