r - take randomly sample based on groups of another dataset with no match cases -


i have 2 datasets these ones:

df <- data.frame(id = 1:20,              sex = rep(x = c(0,1), each=10),              age = c(25,56,29,42,33,33,33,25,25,25,26,57,30,43,34,34,34,26,26,26),              ov = letters[1:20])  df1 <- data.frame(sex = c(0,0,0,1,1),               age = c(25,33,39,41,43)) 

i want take 1 random row every group of sex , age of df according every group of df1, not cases of age in df1 match in df, want impute every group in df1 no match in df value of var ov related same sex , closest age, this:

df3 <- rbind(df[c(8,7),2:4],c(0,39,"d"),c(1,41,"n"),df[14,2:4]) 

note donor case in sex = 0 , age = 39 df[4,] , note donor case in sex = 1 , age = 41 df[14,]

how can this:

using data.table can try this:

1) convert data data.table , add keys:

df1 dt1 <- as.data.table(df1) # convert data.table dt1[, newsex := sex] # serve grouping column dt1[, newage := age] # setkey(dt1, sex, age) # set data.tables keys dt1    sex age newsex newage 1:   0  25      0     25 2:   0  33      0     33 3:   0  39      0     39 4:   1  41      1     41 5:   1  43      1     43  # similar df: dt <- as.data.table(df) setkey(dt, sex, age) dt     id sex age ov  1:  1   0  25   2:  8   0  25  h  3:  9   0  25   4: 10   0  25  j  5:  3   0  29  c  6:  5   0  33  e  7:  6   0  33  f  8:  7   0  33  g  9:  4   0  42  d 10:  2   0  56  b 11: 11   1  26  k 12: 18   1  26  r 13: 19   1  26  s 14: 20   1  26  t 15: 13   1  30  m 16: 15   1  34  o 17: 16   1  34  p 18: 17   1  34  q 19: 14   1  43  n 20: 12   1  57  l 

2) using rolling merge dtnew new groups:

dtnew <- dt1[dt, roll = "nearest"] dtnew     sex age newsex newage id ov  1:   0  25      0     25  1   2:   0  25      0     25  8  h  3:   0  25      0     25  9   4:   0  25      0     25 10  j  5:   0  29      0     25  3  c  6:   0  33      0     33  5  e  7:   0  33      0     33  6  f  8:   0  33      0     33  7  g  9:   0  42      0     39  4  d 10:   0  56      0     39  2  b 11:   1  26      1     41 11  k 12:   1  26      1     41 18  r 13:   1  26      1     41 19  s 14:   1  26      1     41 20  t 15:   1  30      1     41 13  m 16:   1  34      1     41 15  o 17:   1  34      1     41 16  p 18:   1  34      1     41 17  q 19:   1  43      1     43 14  n 20:   1  57      1     43 12  l 

3) can sample. in case can reorder rows in random order, , take firs row of each group:

dtnew <- dtnew[sample(.n)] #create random order sampledt <- unique(dtnew, = c("newsex", "newage")) #take first unique newsex , newage sampledt    sex age newsex newage id ov 1:   0  56      0     39  2  b 2:   0  29      0     25  3  c 3:   1  43      1     43 14  n 4:   1  34      1     41 16  p 5:   0  33      0     33  7  g 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -