r - How to scrape data from this website? -

January 15, 2015

i need login site http://bit.do scraping purpose. data protected password can't figure out how log in access them in r.

i tried

 library (rvest)   url       <-"http://bit.d o/#login/admin"     pgsession <-html_session(url)                 pgform    <-html_form(pgsession)[[1]]           filled_form <- set_values(pgform,  'username' = "test0001",   'password' = "qwerty1234")   submit_form(pgsession,filled_form)   url <- 'http://bit.d o/admin/url/http%3a||2f||2fedition.cnn.com||2f2017||2f07||2f21||2fopinions||2ftrump-russia-putin-lain-opinion||2findex.html'  data_page <- read_html(url)  data_link<- html_nodes(data_page,'td > a')  data_click<- html_nodes(data_page,'td span:nth-child(1)')

but kind of error

 submitting 'null'  error in xml2::url_absolute(form$url, session$url) :   not compatible strsxp: [type=null].

how do? these testing credential username: test0001, password: qwerty1234. here's example of protected data want scrape http://bit.d o/admin/url/http%3a||2f||2fedition.cnn.com||2f2017||2f07||2f21||2fopinions||2ftrump-russia-putin-lain-opinion||2findex.html

important: note due stackoverflow restriction put space between d , o in domain name

since form has no url field, when call submit_form(pgsession, filled_form) call xml2::url_absolute(form$url, session$url) takes place doesn't work because form$url null. in order past this, need give value – if empty – form$url called url_absolute. try adding following line after populate filled_form set_values:

filled_form$url <- ''

Search This Blog

Insert

r - How to scrape data from this website? -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -