dplyr - Finding observation percentile realtive to a distribution with purrr - R -


i trying create dplyr pipe compares value distribution , returns percentile value relative distribution. have tibble list-columns:

library(tidyverse)  raw_val <- c(75,66, 80, 92, 91)  aq_nest <- airquality %>%   select(temp, month) %>%   group_by(month) %>%   nest(temp) %>%   mutate(raw_val = raw_val)  > aq_nest # tibble: 5 x 3   month              data raw_val   <int>            <list>   <dbl> 1     5 <tibble [31 x 1]>      75 2     6 <tibble [30 x 1]>      66 3     7 <tibble [31 x 1]>      80 4     8 <tibble [31 x 1]>      92 5     9 <tibble [30 x 1]>      91 

now can find want single month value:

> ecdf(aq_nest$data[[1]]$temp)(raw_val[1]) [1] 0.9032258 

so 75 sits @ 90th percentile.

but purrr feel there must way each month , add result aq_nest tibble above. here i've tried:

aq_nest <- airquality %>%   select(temp, month) %>%   group_by(month) %>%   nest(temp) %>%   mutate(raw_val = raw_val) %>%   mutate(percentile = map2(data, raw_val, ~ecdf(.x)(.y))) 

which results in error:

error in mutate_impl(.data, dots) :    evaluation error: can't use matrix or array column indexing. 

so betrays lack of understanding purrr. aq_nest$data[[1]]$temp first element of list-column , returns vector of integers. when try map can't seem figure out how coerce raw integer distribution ecdf work.

to summarise, how can use purrr , ecdf returns vector percentiles (i.e. comparing raw_val airquality$temp airquality$month?

you'll want pass temp column ecdf instead of whole dataset. if use map2_dbl instead of map2 can non-list column output.

in mutate use:

map2_dbl(data, raw_val, ~ecdf(.x$temp)(.y))


Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

Optimising Firebase database by automatically overwriting data -

javascript - Angular UI-Grid customTemplate directive causing rows to load slowly/? -