r - Create a co-occurrence matrix from a .csv -


i'm trying create co-occurrence matrix see keywords associated in database.

the data looks this, it's .csv file.

id,    keywords 1,     apple;pear 2,     apple;cherry 3,     pear;cherry 4,     apple;cherry 

and obtain this

          apple  pear  cherry apple       0      1      2 pear        1      0      1 cherry      2      1      0 

the goal use d3.js visualize matrix.

i've posted in r tag because i've used bit before classes, i'm not complete newbie. saw while looking solutions it's possible use python this, never touched in life.

you can use tidyr (and magrittr) package(s) , table function.

library(tidyr) library(magrittr) df <- data.frame(id = 1:4, keywords = c("apple;pear", "apple;cherry", "pear;cherry", "apple;cherry"))  df2 <- df %>% separate(keywords, sep = ";", = c("f1", "f2")) 

this have correct levels in row/column names.

df2$f1 %<>% factor()  df2$f2 %<>% factor()  df2$f1 <- factor(df2$f1, levels = unique(c(levels(df2$f1), levels(df2$f2)))) df2$f2 <- factor(df2$f2, levels = unique(c(levels(df2$f1), levels(df2$f2)))) 

you can use table (it's not symmetric use +)

> table(df2$f1, df2$f2) + table(df2$f2, df2$f1)           apple pear cherry   apple      0    1      2   pear       1    0      1   cherry     2    1      0 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -