Counting the observations through each path of a data.tree in R -
using data.tree build custom hierarchy, i'm looking count number of observations run through each node.
library(mass) library(data.tree) data(cars93) cars93 <- subset(cars93, manufacturer %in% c("acura","toyota"))[, c("manufacturer","drivetrain","passengers")] > cars93 manufacturer drivetrain passengers 1 acura front 5 2 acura front 5 84 toyota front 5 85 toyota front 4 86 toyota front 5 87 toyota 4wd 7 the current output adding children correctly first sub node, skips "drivetrain" column "acura" level, , "toyota" level stopped adding "passengers" children after first iteration.
levelname obs.ct 1 cars 6 2 ¦--acura 2 3 ¦ °--5 2 4 °--toyota 4 5 ¦--4wd 1 6 ¦ °--7 1 7 °--front 3 all of built-in counting functions appear apply node , leaf levels, not observation levels, i'm not missing there. building tree data frame 1 node @ time , counting rows solution i've come across.
i've come close updating training code https://cran.r-project.org/web/packages/data.tree/vignettes/applications.html#id3-introduction, breaks somewhere between splitting on each feature , recursively calling function each child. i've tried sapply'ing split of features @ once, results adding children wrong levels of hierarchy. closest i've been able align output.
ispure <- function(data) { length(unique(data[, ncol(data)])) == 1 } path_func <- function(node, data) { node$obs.ct <- nrow(data) if (ispure(data)) { child <- node$addchild(unique(data[, ncol(data)])) child$obs.ct <- nrow(data) } else { childobs <- split(data[ , 2:ncol(data), drop = false], data[ , 1], drop = true) for(i in 1:length(childobs)) { child <- node$addchild(names(childobs)[i]) path_func(child, childobs[[i]]) } } } tree <- node$new("cars") path_func(tree, cars93) print(tree, "obs.ct")
Comments
Post a Comment