我刚开始使用Web抓取功能,并且正在使用BeautifulSoup(Python)进行这项工作.我想获取示例网页的一些属性数据进行测试.代码开始如下,
import requests
from bs4 import BeautifulSoup as Soup
page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text)
# now, I would like to get the price for sale price of the apartment
# the element in the HTML DOM is as following,
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span>
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"]
# I write the code as following,
value = soup.select('span#yui_3_18_1_1_1464168312477_3548')
print value
我没有任何结果.我做错了什么?
解决方法:
您正在控制台中查看与从请求获取的源不同的源,它会动态生成span id =“ yui_3_18_1_1_1464170172533_3087”,因此您将需要使用selenium之类的东西.
不幸的是,每次访问时id也是唯一的,所以我们不能使用它,因为父div是一致的,所以我们可以使用css选择器通过main-row home-summary-row类获得父内部的第一个跨度:
In [4]: from selenium import webdriver
In [5]: dr = webdriver.PhantomJS()
In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/")
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span')
In [8]: print(span.text)
$12,895,000
我使用phantomjs进行无头浏览,如果愿意,可以使用Firefox或Chrome,所有信息都在链接中.
实际上,再次查看源代码,我们可以使用bs4做同样的事情,ID是唯一动态生成的东西,因此,如果我们忘记了ID,我们可以获得价格:
In [26]: soup.select_one("div.main-row.home-summary-row span").text
Out[26]: u'$12,895,000'
更好的方法是使用meta标签获取大量信息:
import requests
from bs4 import BeautifulSoup as Soup
page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text,"lxml")
metas = soup.select("meta")
现在,如果我们看一下元数据返回的内容:
from pprint import pprint as pp
pp(metas)
[<meta content="on" http-equiv="x-dns-prefetch-control"/>,
<meta charset="unicode-escape"/>,
<meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>,
<meta content="Zillow, Inc." name="author"/>,
<meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>,
<meta content="none" name="msapplication-config"/>,
<meta content="ALL" name="ROBOTS"/>,
<meta content="NOYDIR" name="ROBOTS"/>,
<meta content="NOODP" name="ROBOTS"/>,
<meta content="yes" name="apple-mobile-web-app-capable"/>,
<meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>,
<meta content="telephone=no" name="format-detection"/>,
<meta content="#3366b8" name="msapplication-TileColor"/>,
<meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>,
<meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>,
<meta content="7cb4abe457d82ae8" name="y_key"/>,
<meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>,
<meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>,
<meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>,
<meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>,
<meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>,
<meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>,
<meta content="172285552816089" property="fb:app_id"/>,
<meta content="zillow_fb:home" property="og:type"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>,
<meta content="7" property="zillow_fb:beds"/>,
<meta content="10" property="zillow_fb:baths"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>,
<meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>,
<meta content="Pacific Palisades Home For Sale" property="og:title"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>,
<meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.' property="og:description"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>,
<meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>,
<meta content="640" property="og:video:width"/>,
<meta content="video/mp4" property="og:video:type"/>,
<meta content="360" property="og:video:height"/>,
<meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>,
<meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>,
<meta content="http://zillow.com" name="google-signin-cookiepolicy"/>,
<meta content="summary_large_image" name="twitter:card"/>,
<meta content="@Zillow" name="twitter:site"/>,
<meta content="@Zillow" name="twitter:creator"/>,
<meta content="1630 Amalfi Dr" name="twitter:title"/>,
<meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>,
<meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>,
<meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>,
<meta content="USD" itemprop="priceCurrency"/>,
<meta content="$12,895,000" itemprop="price"/>,
<meta content="34.060605" itemprop="latitude"/>,
<meta content="-118.501625" itemprop="longitude"/>]
我们可以使用以下属性获取价格和其他信息:
In [22]: soup = Soup(response.text,"lxml")
In [23]: soup.select_one("meta[itemprop=price]")["content"]
Out[23]: '$12,895,000'
In [24]: soup.select_one("meta[name=twitter:description]")["content"]
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, & floating staircase create a grand entrance w/ glass wine cellar, formal living & dining rooms. Floor plan flows openly between gourmet kitchen, family room, & patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, & master suite add warmth to the contemporary feel, & detailed wood paneling & coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views & private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, & elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, & saltwater pool/spa complete this elegant estate.'
In [27]: soup.select_one("meta[itemprop=latitude]")["content"]
Out[27]: '34.060605'
In [28]: soup.select_one("meta[itemprop=longitude]")["content"]
Out[28]: '-118.501625'
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"]
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272'