r - Create a co-occurrence matrix from a .csv -
i'm trying create co-occurrence matrix see keywords associated in database.
the data looks this, it's .csv file.
id, keywords 1, apple;pear 2, apple;cherry 3, pear;cherry 4, apple;cherry
and obtain this
apple pear cherry apple 0 1 2 pear 1 0 1 cherry 2 1 0
the goal use d3.js visualize matrix.
i've posted in r
tag because i've used bit before classes, i'm not complete newbie. saw while looking solutions it's possible use python this, never touched in life.
you can use tidyr
(and magrittr
) package(s) , table
function.
library(tidyr) library(magrittr) df <- data.frame(id = 1:4, keywords = c("apple;pear", "apple;cherry", "pear;cherry", "apple;cherry")) df2 <- df %>% separate(keywords, sep = ";", = c("f1", "f2"))
this have correct levels in row/column names.
df2$f1 %<>% factor() df2$f2 %<>% factor() df2$f1 <- factor(df2$f1, levels = unique(c(levels(df2$f1), levels(df2$f2)))) df2$f2 <- factor(df2$f2, levels = unique(c(levels(df2$f1), levels(df2$f2))))
you can use table (it's not symmetric use +
)
> table(df2$f1, df2$f2) + table(df2$f2, df2$f1) apple pear cherry apple 0 1 2 pear 1 0 1 cherry 2 1 0
Comments
Post a Comment