r - How to scrape data from this website? -
i need login site http://bit.do scraping purpose. data protected password can't figure out how log in access them in r.
i tried
library (rvest) url <-"http://bit.d o/#login/admin" pgsession <-html_session(url) pgform <-html_form(pgsession)[[1]] filled_form <- set_values(pgform, 'username' = "test0001", 'password' = "qwerty1234") submit_form(pgsession,filled_form) url <- 'http://bit.d o/admin/url/http%3a||2f||2fedition.cnn.com||2f2017||2f07||2f21||2fopinions||2ftrump-russia-putin-lain-opinion||2findex.html' data_page <- read_html(url) data_link<- html_nodes(data_page,'td > a') data_click<- html_nodes(data_page,'td span:nth-child(1)')
but kind of error
submitting 'null' error in xml2::url_absolute(form$url, session$url) : not compatible strsxp: [type=null].
how do? these testing credential username: test0001, password: qwerty1234. here's example of protected data want scrape http://bit.d o/admin/url/http%3a||2f||2fedition.cnn.com||2f2017||2f07||2f21||2fopinions||2ftrump-russia-putin-lain-opinion||2findex.html
important: note due stackoverflow restriction put space between d , o in domain name
since form has no url
field, when call submit_form(pgsession, filled_form)
call xml2::url_absolute(form$url, session$url)
takes place doesn't work because form$url
null
. in order past this, need give value – if empty – form$url
called url_absolute
. try adding following line after populate filled_form
set_values
:
filled_form$url <- ''
Comments
Post a Comment