r - How to produce document term matrix in text2vector only from stored list of words -
what syntax in text2vec vectorize texts , achieve dtm indicated list of words?
how vectorize , produce document term matrix on indicated features? , if features not appear in text variable should stay empty.
i need produce term document matrices same columns in dtm run modelling on, otherwise cannot use random forest model on new documents.
you can create document term matrix specific set of features:
v = create_vocabulary(c("word1", "word2")) vectorizer = vocab_vectorizer(v) dtm_test = create_dtm(it, vectorizer)
however don't recommend 1) use random forest on such sparse data - won't work 2) perform feature selection way described - overfit.
Comments
Post a Comment