r - Keeping only the x largest groups with data.table -


i have started using data.table package in r, stumbled issue not know how tackle data.table.

sample data:

set.seed(1) library(data.table) dt = data.table(group=c("a","a","a","b","b","b","c","c"),value = runif(8)) 

i can add group count statement

dt[,groupcount := .n ,group] 

but want keep x groups largest value groupcount. let's assume x=1 for example.

i tried chaining follows:

dt[,groupcount := .n ,group][groupcount %in% head(sort(unique(groupcount),decreasing=true),1)] 

but since group , b both have 3 elements, both remain in data.table. want x largest groups x=1, want 1 of groups (a or b) remain. assume can done in single line data.table. true, , if yes, how?


to clarify: x arbitrarily chosen number here. function should work x=3, return 3 largest groups.

how making use of order of groupcount

setorder(dt, -groupcount)  x <- 1    dt[group %in% dt[ , unique(group)][1:x] ]  #   group     value groupcount # 1:     0.2655087          3 # 2:     0.3721239          3 # 3:     0.5728534          3   x <- 3 dt[group %in% dt[ , unique(group)][1:x] ]   #     group     value groupcount # 1:     0.2655087          3 # 2:     0.3721239          3 # 3:     0.5728534          3 # 4:     b 0.9082078          3 # 5:     b 0.2016819          3 # 6:     b 0.8983897          3 # 7:     c 0.9446753          2 # 8:     c 0.6607978          2  ## alternative syntax # dt[group %in% unique(dt$group)[1:x] ] 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -