python - Cannot chain find and find_all in BeautifulSoup -
i have book , docs on beautifulsoup. both should able chain find/find_all methods , use subscripts want single page scrape. not appear case. consider following table.
<tr> <td><span style="display:none;" class="sortkey">dresser !</span><span class="sorttext">**<a href="/wiki/louise_dresser" title="louise dresser">louise dresser</a>**</span></td> <td><span style="display:none;" class="sortkey">ship !</span><span class="sorttext"><i><a href="/wiki/a_ship_comes_in" title="a ship comes in">a ship comes in</a></i></span></td> <td><span style="display:none;" class="sortkey">pleznik !</span><span class="sorttext">mrs. pleznik</span></td> </tr> <tr> <td><span style="display:none;" class="sortkey">swanson !</span><span class="sorttext"><a href="/wiki/gloria_swanson" title="gloria swanson">gloria swanson</a></span></td> <td><i><a href="/wiki/sadie_thompson" title="sadie thompson">sadie thompson</a></i></td> <td><span style="display:none;" class="sortkey">thompson !</span><span class="sorttext">sadie thompson</span></td> </tr> <tr> <th scope="row" rowspan="6" style="text-align:center"><a href="/wiki/1928_in_film" title="1928 in film">1928</a>/<a href="/wiki/1929_in_film" title="1929 in film">29</a><br /> <small><a href="/wiki/2nd_academy_awards" title="2nd academy awards">(2nd)</a></small></th> <td style="background:#faeb86"><b><span style="display:none;" class="sortkey">pickford !</span><span class="sorttext">**<a href="/wiki/mary_pickford" title="mary pickford">mary pickford</a>**</span> <img alt="award winner" src="//upload.wikimedia.org/wikipedia/commons/f/f9/double-dagger-14-plain.png" width="9" height="14" data-file-width="9" data-file-height="14" /></b></td>
for every table row, need grab first element, text inside of first nested tag. lousie dresser first data point, followed gloria swanson, , mary pickford.
i thought following me there, wrong , 6 hours later spent.
def getactresses(url): try: html = urlopen(url) except httperror: print("page not found.") return none try: bsobj = beautifulsoup(html, "lxml") soup = bsobj.find("table", {"class":"wikitable sortable"}) except attributeerror: print("error creating/navigating soup object") data = soup.find_all("tr").find_all("td").find("a").get_text() print(data) getactresses("https://en.wikipedia.org/wiki/academy_award_for_best_actress")
this isn't code i've tried. i've tried looping through rows, table data cells, accessing tags. i've tried asking tags , regexing them out, told couldn't have text wanted. frequent error i've gotten when trying chain operations (as above) attributeerror: 'resultset' object has no attribute 'find'.
subscripting absolutely doesn't work, when replicating book examples (go fig?!). also, i've had processes abort themselves, didn't know possible.
thoughts on what's going on , why should simple seems such event enormously appreciated.
import requests bs4 import beautifulsoup def getactresses(url): res = requests.get(url) try: soup = beautifulsoup(res.content, "lxml") table = soup.find("table", {"class":"wikitable sortable"}) except attributeerror: print("error creating/navigating soup object") tr = table.find_all("tr") _tr in tr: td = _tr.find_all("td") _td in td: = _td.find_all("a") _a in a: print(_a.text.encode("utf-8")) getactresses("https://en.wikipedia.org/wiki/academy_award_for_best_actress")
use text
instead of get_text()
, sorry used requests
module demonstrate
find_all
method return list have loop through it
i sorry, ma new stackoverflow, don't know how write answers. anyway believe code clear doubts
Comments
Post a Comment