python - Cannot chain find and find

python - Cannot chain find and find_all in BeautifulSoup -

April 15, 2014

i have book , docs on beautifulsoup. both should able chain find/find_all methods , use subscripts want single page scrape. not appear case. consider following table.

<tr> <td><span style="display:none;" class="sortkey">dresser !</span><span class="sorttext">**<a href="/wiki/louise_dresser" title="louise dresser">louise dresser</a>**</span></td> <td><span style="display:none;" class="sortkey">ship !</span><span class="sorttext"><i><a href="/wiki/a_ship_comes_in" title="a ship comes in">a ship comes in</a></i></span></td> <td><span style="display:none;" class="sortkey">pleznik !</span><span class="sorttext">mrs. pleznik</span></td> </tr> <tr> <td><span style="display:none;" class="sortkey">swanson !</span><span class="sorttext"><a href="/wiki/gloria_swanson" title="gloria swanson">gloria swanson</a></span></td> <td><i><a href="/wiki/sadie_thompson" title="sadie thompson">sadie thompson</a></i></td> <td><span style="display:none;" class="sortkey">thompson !</span><span class="sorttext">sadie thompson</span></td> </tr> <tr> <th scope="row" rowspan="6" style="text-align:center"><a href="/wiki/1928_in_film" title="1928 in film">1928</a>/<a href="/wiki/1929_in_film" title="1929 in film">29</a><br /> <small><a href="/wiki/2nd_academy_awards" title="2nd academy awards">(2nd)</a></small></th> <td style="background:#faeb86"><b><span style="display:none;" class="sortkey">pickford !</span><span class="sorttext">**<a href="/wiki/mary_pickford" title="mary pickford">mary pickford</a>**</span> <img alt="award winner" src="//upload.wikimedia.org/wikipedia/commons/f/f9/double-dagger-14-plain.png" width="9" height="14" data-file-width="9" data-file-height="14" /></b></td>

for every table row, need grab first element, text inside of first nested tag. lousie dresser first data point, followed gloria swanson, , mary pickford.

i thought following me there, wrong , 6 hours later spent.

def getactresses(url):     try:         html = urlopen(url)     except httperror:         print("page not found.")         return none     try:         bsobj = beautifulsoup(html, "lxml")         soup = bsobj.find("table", {"class":"wikitable sortable"})     except attributeerror:         print("error creating/navigating soup object")     data = soup.find_all("tr").find_all("td").find("a").get_text()     print(data)   getactresses("https://en.wikipedia.org/wiki/academy_award_for_best_actress")

this isn't code i've tried. i've tried looping through rows, table data cells, accessing tags. i've tried asking tags , regexing them out, told couldn't have text wanted. frequent error i've gotten when trying chain operations (as above) attributeerror: 'resultset' object has no attribute 'find'. subscripting absolutely doesn't work, when replicating book examples (go fig?!). also, i've had processes abort themselves, didn't know possible.

thoughts on what's going on , why should simple seems such event enormously appreciated.

import requests bs4 import beautifulsoup  def getactresses(url):     res = requests.get(url)      try:         soup = beautifulsoup(res.content, "lxml")         table = soup.find("table", {"class":"wikitable sortable"})     except attributeerror:         print("error creating/navigating soup object")      tr = table.find_all("tr")      _tr in tr:         td = _tr.find_all("td")         _td in td:             = _td.find_all("a")             _a in a:                 print(_a.text.encode("utf-8"))  getactresses("https://en.wikipedia.org/wiki/academy_award_for_best_actress")

use text instead of get_text() , sorry used requests module demonstrate

find_all method return list have loop through it

i sorry, ma new stackoverflow, don't know how write answers. anyway believe code clear doubts

Search This Blog

Insert

python - Cannot chain find and find_all in BeautifulSoup -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -