r - take randomly sample based on groups of another dataset with no match cases -

June 15, 2012

i have 2 datasets these ones:

df <- data.frame(id = 1:20,              sex = rep(x = c(0,1), each=10),              age = c(25,56,29,42,33,33,33,25,25,25,26,57,30,43,34,34,34,26,26,26),              ov = letters[1:20])  df1 <- data.frame(sex = c(0,0,0,1,1),               age = c(25,33,39,41,43))

i want take 1 random row every group of sex , age of df according every group of df1, not cases of age in df1 match in df, want impute every group in df1 no match in df value of var ov related same sex , closest age, this:

df3 <- rbind(df[c(8,7),2:4],c(0,39,"d"),c(1,41,"n"),df[14,2:4])

note donor case in sex = 0 , age = 39 df[4,] , note donor case in sex = 1 , age = 41 df[14,]

how can this:

using data.table can try this:

1) convert data data.table , add keys:

df1 dt1 <- as.data.table(df1) # convert data.table dt1[, newsex := sex] # serve grouping column dt1[, newage := age] # setkey(dt1, sex, age) # set data.tables keys dt1    sex age newsex newage 1:   0  25      0     25 2:   0  33      0     33 3:   0  39      0     39 4:   1  41      1     41 5:   1  43      1     43  # similar df: dt <- as.data.table(df) setkey(dt, sex, age) dt     id sex age ov  1:  1   0  25   2:  8   0  25  h  3:  9   0  25   4: 10   0  25  j  5:  3   0  29  c  6:  5   0  33  e  7:  6   0  33  f  8:  7   0  33  g  9:  4   0  42  d 10:  2   0  56  b 11: 11   1  26  k 12: 18   1  26  r 13: 19   1  26  s 14: 20   1  26  t 15: 13   1  30  m 16: 15   1  34  o 17: 16   1  34  p 18: 17   1  34  q 19: 14   1  43  n 20: 12   1  57  l

2) using rolling merge dtnew new groups:

dtnew <- dt1[dt, roll = "nearest"] dtnew     sex age newsex newage id ov  1:   0  25      0     25  1   2:   0  25      0     25  8  h  3:   0  25      0     25  9   4:   0  25      0     25 10  j  5:   0  29      0     25  3  c  6:   0  33      0     33  5  e  7:   0  33      0     33  6  f  8:   0  33      0     33  7  g  9:   0  42      0     39  4  d 10:   0  56      0     39  2  b 11:   1  26      1     41 11  k 12:   1  26      1     41 18  r 13:   1  26      1     41 19  s 14:   1  26      1     41 20  t 15:   1  30      1     41 13  m 16:   1  34      1     41 15  o 17:   1  34      1     41 16  p 18:   1  34      1     41 17  q 19:   1  43      1     43 14  n 20:   1  57      1     43 12  l

3) can sample. in case can reorder rows in random order, , take firs row of each group:

dtnew <- dtnew[sample(.n)] #create random order sampledt <- unique(dtnew, = c("newsex", "newage")) #take first unique newsex , newage sampledt    sex age newsex newage id ov 1:   0  56      0     39  2  b 2:   0  29      0     25  3  c 3:   1  43      1     43 14  n 4:   1  34      1     41 16  p 5:   0  33      0     33  7  g

Search This Blog

Insert

r - take randomly sample based on groups of another dataset with no match cases -

Comments

Post a Comment

Popular posts from this blog

service - Android MediaPlayer calls onCompletion before it already finished -

javascript - Training Neural Network to play flappy bird with genetic algorithm - Why can't it learn? -

javascript - Create a stacked percentage column -