algorithm - Most common subpath in a collection of paths -
there numerous literature on web longest common subsequence problem have different problem , wondering if knows of fast algorithm.
say, have collection of paths:
[1,2,3,4,5,6,7], [2,3,4,9,10], [3,4,6,7], ...
we see subpath [3,4] common.
know of neat algorithm find this? case there tens of thousands of paths!
assuming "path" has encompass @ least 2 elements, common path have 2 elements (although there path more 2 elements that's equally common -- more on later). can iterate lists , count how each pair of consecutive numbers appears in different lists , remember pairs appear often. requires iterating each list once, minimum amount you'd have in case.
if interested in longest common path, can start same way, finding common 2-segment-paths, additionally counts, record position of each of segments (e.g. {(3,4): [2, 1, 0], ...}
in example, numbers in list indicating position of segment in different paths). now, can take most-common length-2-paths , see if of those, next element same all occurrences of path. in case have most-common length-3-path equally common prior length-2 path (it can not more common, obviously). can repeat length-4, length-5, etc. until can no longer expanded without making path "less common". part requires work of n*k each expansion, n being number of candidates left , k how appear.
(this assumes frequency beats length, i.e. if there length-2 path appearing 3 times, prefer on length-3 path appearing twice. same apprach can used different starting length, e.g. requiring @ least length-3 paths, without changing basic algorithm or complexity.)
here's simple example implementation in python demonstrate algorithm. goes length-3, extended length-4 , beyond loop. also, not check edge-cases (array-out-of-bounds etc.)
# example data data = [[1,2, 4,5,6,7, 9], [1,2,3,4,5,6, 8,9], [1,2, 4,5,6,7,8 ]] # step one: count how , each pair appears collections import defaultdict pairs = defaultdict(list) i, lst in enumerate(data): k, pair in enumerate(zip(lst, lst[1:])): pairs[pair].append((i,k)) # step two: find common pair , filter = max([len(lst) lst in pairs.values()]) pairs = {k: v k, v in pairs.items() if len(v) == most} print(pairs) # {(1, 2): [(0, 0), (1, 0), (2, 0)], (4, 5): [(0, 2), (1, 3), (2, 2)], (5, 6): [(0, 3), (1, 4), (2, 3)]} # step three: expand pairs triplets, triplets quadruples, etc. triples = [k + (data[v[0][0]][v[0][1]+2],) k, v in pairs.items() if len(set(data[i][k+2] (i,k) in v)) == 1] print(triples) # [(4, 5, 6)]
Comments
Post a Comment