r - Multiple tidying operations in one pipeline -


this more of code-cleaning exercise i'm doing right now. initial data looks this:

year    county    town  ...  funding received ... (90+ variables total) 2016             x               yes 2015             y               no 2014             x               yes 2016      b        z               yes 

i couldn't see how count of submitted , approved applications transformed indicator variables counted following code:

counties <- original_data %>%   select(county, funded, year) %>%   mutate(     a=ifelse(county == "a", 1,0),     b=ifelse(county == "b", 1,0),     c=ifelse(county == "c", 1,0),     ... etc ...   ) 

and output looks like

county    funding received    year    binary.a    binary.b               yes           2016       1           0               no            2015       1           0   b             no            2016       0           1 

this data transformed 2 dataframes (submitted , funded) count of each county's submitted , funded applications per year using following code:

countysum <- counties %>%   select(-funded) %>%   group_by(county, year) %>%   summarise_all(sum, na.rm = t) 

and output looks like:

county    year    sum.a    sum.b         2016      32       0         2015      24       0   b       2016       0      16 

but data in tidier format used few more commands:

countysum$submitted <- rowsums(countysum[,3:15, na.rm = t) #3:15 county indicator vars countysum <- countysum[,-c(3:19)] 

now question is: there way reduce these actions singular pipeline? right have code works, prefer have code works , little easier follow. apologies lack of data, cannot share it.

i'm not sure quite understand final desired output looks like, think can take advantage of fact logical values coerced integers , skip creation of dummy columns.

library(dplyr)  byyear  <- original_data %>%     group_by(county, year) %>%     summarize(        wasfunded = any(funded == "yes", na.rm = t)      , submittedapplication = any(submittedapp == "yes", na.rm = t) # i'm assuming did/didn't submit 1 of other variables    )   # if don't need byyear data else (i seem to),  # can pipe straight next line yrs_funded_by_county  <- byyear %>%    summarize(       n_yrs_funded = sum(wasfunded)     , n_yrs_submitted = sum(submittedapplication)     , pct_awarded = n_yrs_funded/n_yrs_submitted  # maybe don't need award rate, threw it b/c it's kind of stuff grant person cares   ) 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -