corpus - How to Extract keywords from a Data Frame in R -

January 15, 2015

i new text-mining in r. want remove stopwords (i.e. extract keywords) data frame's column , put keywords new column.

i tried make corpus, didn't me.

df$c3 have. add column df$c4, can't work.

df <- structure(list(c3 = structure(c(3l, 4l, 1l, 7l, 6l, 9l, 5l, 8l,         10l, 2l), .label = c("are doing good", "for help", "hello everyone",         "hope all", "i hope", "i need help", "in life", "it work",         "on text-mining", "thanks"), class = "factor"), c4 = structure(c(2l,         4l, 1l, 6l, 3l, 7l, 5l, 9l, 8l, 3l), .label = c("doing good",         "everyone", "help", "hope", "hope", "life", "text-mining", "thanks",         "work"), class = "factor")), .names = c("c3", "c4"), row.names = c(na,         -10l), class = "data.frame")  head(df) #               c3          c4 # 1 hello    # 2   hope        hope # 3 doing  doing # 4        in life        life # 5    need        # 6 on text-mining text-mining

this solution uses packages dplyr , tidytext.

library(dplyr) library(tidytext)  # subset of dataset dt = data.frame(c1 = c(108,20, 999, 52, 400),                 c2 = c(1,3,7, 6, 9),                 c3 = c("hello everyone","hope all","are doing good","in life","i need help"), stringsasfactors = f)  # function combine words (by pasting 1 next other) f = function(x) { paste(x, collapse = " ") }  dt %>%   unnest_tokens(word, c3) %>%      # split phrases words   filter(!word %in% stop_words$word) %>%   # keep appropriate words   group_by(c1, c2) %>%             # each combination of c1 , c2   summarise(word = f(word)) %>%    # combine multiple words (if there multiple)   ungroup()                        # forget grouping  # # tibble: 2 x 3 #        c1    c2  word #      <dbl> <dbl> <chr> #   1    20     3  hope #   2    52     6  life

the problem here "stop words" built in package filter out of words want keep. therefore, have add manual step specify words need include. can this:

dt %>%   unnest_tokens(word, c3) %>%      # split phrases words   filter(!word %in% stop_words$word | word %in% c("everyone","doing","good")) %>%   # keep appropriate words   group_by(c1, c2) %>%             # each combination of c1 , c2   summarise(word = f(word)) %>%    # combine multiple words (if there multiple)   ungroup()                        # forget grouping  # # tibble: 4 x 3 #        c1    c2       word #      <dbl> <dbl>      <chr> #   1    20     3       hope #   2    52     6       life #   3   108     1   #   4   999     7 doing

Search This Blog

Insert

corpus - How to Extract keywords from a Data Frame in R -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -