r - Keeping only the x largest groups with data.table -
i have started using data.table package in r, stumbled issue not know how tackle data.table.
sample data:
set.seed(1) library(data.table) dt = data.table(group=c("a","a","a","b","b","b","c","c"),value = runif(8))
i can add group count statement
dt[,groupcount := .n ,group]
but want keep x groups largest value groupcount
. let's assume x=1
for example.
i tried chaining follows:
dt[,groupcount := .n ,group][groupcount %in% head(sort(unique(groupcount),decreasing=true),1)]
but since group , b both have 3 elements, both remain in data.table. want x largest groups x=1, want 1 of groups (a or b) remain. assume can done in single line data.table. true, , if yes, how?
to clarify: x arbitrarily chosen number here. function should work x=3, return 3 largest groups.
how making use of order of groupcount
setorder(dt, -groupcount) x <- 1 dt[group %in% dt[ , unique(group)][1:x] ] # group value groupcount # 1: 0.2655087 3 # 2: 0.3721239 3 # 3: 0.5728534 3 x <- 3 dt[group %in% dt[ , unique(group)][1:x] ] # group value groupcount # 1: 0.2655087 3 # 2: 0.3721239 3 # 3: 0.5728534 3 # 4: b 0.9082078 3 # 5: b 0.2016819 3 # 6: b 0.8983897 3 # 7: c 0.9446753 2 # 8: c 0.6607978 2 ## alternative syntax # dt[group %in% unique(dt$group)[1:x] ]
Comments
Post a Comment