dplyr - Finding observation percentile realtive to a distribution with purrr - R -
i trying create dplyr pipe compares value distribution , returns percentile value relative distribution. have tibble list-columns:
library(tidyverse) raw_val <- c(75,66, 80, 92, 91) aq_nest <- airquality %>% select(temp, month) %>% group_by(month) %>% nest(temp) %>% mutate(raw_val = raw_val) > aq_nest # tibble: 5 x 3 month data raw_val <int> <list> <dbl> 1 5 <tibble [31 x 1]> 75 2 6 <tibble [30 x 1]> 66 3 7 <tibble [31 x 1]> 80 4 8 <tibble [31 x 1]> 92 5 9 <tibble [30 x 1]> 91 now can find want single month value:
> ecdf(aq_nest$data[[1]]$temp)(raw_val[1]) [1] 0.9032258 so 75 sits @ 90th percentile.
but purrr feel there must way each month , add result aq_nest tibble above. here i've tried:
aq_nest <- airquality %>% select(temp, month) %>% group_by(month) %>% nest(temp) %>% mutate(raw_val = raw_val) %>% mutate(percentile = map2(data, raw_val, ~ecdf(.x)(.y))) which results in error:
error in mutate_impl(.data, dots) : evaluation error: can't use matrix or array column indexing. so betrays lack of understanding purrr. aq_nest$data[[1]]$temp first element of list-column , returns vector of integers. when try map can't seem figure out how coerce raw integer distribution ecdf work.
to summarise, how can use purrr , ecdf returns vector percentiles (i.e. comparing raw_val airquality$temp airquality$month?
you'll want pass temp column ecdf instead of whole dataset. if use map2_dbl instead of map2 can non-list column output.
in mutate use:
map2_dbl(data, raw_val, ~ecdf(.x$temp)(.y))
Comments
Post a Comment