R: Calculate minimal string distance and find row which minimizes distance -
i have data frame column of class 'character'. trying (a) create new variable in way summarizing how similar value of row in column similar other value in column , (b) identify row of similar available value in column given value in column.
my existing approach calculate edit distance measure using stringdist package (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf) except seems incredibly computationally demanding , after hours of waiting still not compute, it's not clear how search smallest distance each observation based on finding distance of given value other values in same vector. furthermore, doesn't appear return index of similar value.
is there computationally tractable way develop minimal distance measure each observation , comparison row distance minimized?
# create data data.frame(x = c("a","abbb","aa", "abbbkdjsfjldkfjldfkjl")) # want data.frame(smallest_distance = c(1,20,1,90), closest_match = c(3,3,1,2))
Comments
Post a Comment