使用Python进行*数据搜索

我试图从以下wikipedia page中检索3列(NFL团队,玩家名称,大学团队).我是python的新手并且一直在尝试使用beautifulsoup来完成这项工作.我只需要属于QB的列,但我甚至无法获得所有列的位置.这是我到目前为止所没有输出的东西,我不完全确定原因.我相信这是由于标签,但我不知道要改变什么.任何帮助将不胜感激.’

wiki = "http://en.wikipedia.org/wiki/2008_NFL_draft"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

rnd = ""
pick = ""
NFL = ""
player = ""
pos = ""
college = ""
conf = ""
notes = ""

table = soup.find("table", { "class" : "wikitable sortable" })

#print table

#output = open('output.csv','w')

for row in table.findAll("tr"):
    cells = row.findAll("href")
    print "---"
    print cells.text
    print "---"
    #For each "tr", assign each "td" to a variable.
    #if len(cells) > 1:
        #NFL = cells[1].find(text=True)
        #player = cells[2].find(text = True)
        #pos = cells[3].find(text=True)
        #college = cells[4].find(text=True)
        #write_to_file = player + " " + NFL + " " + college + " " + pos
        #print write_to_file

    #output.write(write_to_file)

#output.close()

我知道其中有很多是评论它,因为我试图找出崩溃的位置.

解决方法:

这是我要做的:

>找到播放器选择段落
>使用find_next_sibling()获得下一个wikitable
>找到里面的所有tr标签
>对于每一行,找到td th标签并通过索引获得所需的单元格

这是代码:

filter_position = 'QB'
player_selections = soup.find('span', id='Player_selections').parent
for row in player_selections.find_next_sibling('table', class_='wikitable').find_all('tr')[1:]:
    cells = row.find_all(['td', 'th'])

    try:
        nfl_team, name, position, college = cells[3].text, cells[4].text, cells[5].text, cells[6].text
    except IndexError:
        continue

    if position != filter_position:
        continue

    print nfl_team, name, position, college

这是输出(只过滤了四分卫):

Atlanta Falcons Ryan, MattMatt Ryan† QB Boston College
Baltimore Ravens Flacco, JoeJoe Flacco QB Delaware
Green Bay Packers Brohm, BrianBrian Brohm QB Louisville
Miami Dolphins Henne, ChadChad Henne QB Michigan
New England Patriots O'Connell, KevinKevin O'Connell QB San Diego State
Minnesota Vikings Booty, John DavidJohn David Booty QB USC
Pittsburgh Steelers Dixon, DennisDennis Dixon QB Oregon
Tampa Bay Buccaneers Johnson, JoshJosh Johnson QB San Diego
New York Jets Ainge, ErikErik Ainge QB Tennessee
Washington Redskins Brennan, ColtColt Brennan QB Hawaiʻi
New York Giants Woodson, Andre'Andre' Woodson QB Kentucky
Green Bay Packers Flynn, MattMatt Flynn QB LSU
Houston Texans Brink, AlexAlex Brink QB Washington State
上一篇:Ruby代码编辑RubyMine 2021 mac


下一篇:算法 - 阶乘的除法求模 - 费马小定理