python 3.x - how to scrape web page that is not written directly using HTML, but is auto-generated using JavaScript? -
i trying scrape http://washingtonmonthly.com/college_guide?ranking=2016-rankings-national-universities website.
this website auto-generated using javascript update dom tree.i have tried below selenium code getting elements inside table, returns empty list.
from selenium import webdriver import time driver = webdriver.chrome(executable_path="c:\\chrme\\chromedriver") driver.get('http://washingtonmonthly.com/college_guide?ranking=best-colleges-for-adult-learners-4-year-colleges') time.sleep(5) test = driver.execute_script("return document.getelementsbyclassname('tablesaw tablesaw-swipe')") print(test) is there way run scripts , html code ? using python 3.6
run script , suppose give table contains including csv output.
import csv selenium import webdriver selenium.webdriver.common.by import selenium.webdriver.support.wait import webdriverwait selenium.webdriver.support import expected_conditions ec driver = webdriver.chrome() wait = webdriverwait(driver, 10) outfile = open('table_data.csv','w',newline='') writer = csv.writer(outfile) driver.get("http://washingtonmonthly.com/college_guide?ranking=2016-rankings-national-universities") wait.until(ec.frame_to_be_available_and_switch_to_it("iframeresizer0")) wait.until(ec.visibility_of_element_located((by.css_selector, 'table.tablesaw'))) tab_data = driver.find_element_by_css_selector('table.tablesaw') list_rows = [[cell.text cell in row.find_elements_by_css_selector('td')] row in tab_data.find_elements_by_css_selector('tr')] data in list_rows: writer.writerow(data) print(data) driver.quit() btw, i'm assuming have lxml library installed.
Comments
Post a Comment