algorithm - Most common subpath in a collection of paths -

July 15, 2011

there numerous literature on web longest common subsequence problem have different problem , wondering if knows of fast algorithm.

say, have collection of paths:

[1,2,3,4,5,6,7], [2,3,4,9,10], [3,4,6,7], ...

we see subpath [3,4] common.

know of neat algorithm find this? case there tens of thousands of paths!

assuming "path" has encompass @ least 2 elements, common path have 2 elements (although there path more 2 elements that's equally common -- more on later). can iterate lists , count how each pair of consecutive numbers appears in different lists , remember pairs appear often. requires iterating each list once, minimum amount you'd have in case.

if interested in longest common path, can start same way, finding common 2-segment-paths, additionally counts, record position of each of segments (e.g. {(3,4): [2, 1, 0], ...} in example, numbers in list indicating position of segment in different paths). now, can take most-common length-2-paths , see if of those, next element same all occurrences of path. in case have most-common length-3-path equally common prior length-2 path (it can not more common, obviously). can repeat length-4, length-5, etc. until can no longer expanded without making path "less common". part requires work of n*k each expansion, n being number of candidates left , k how appear.

(this assumes frequency beats length, i.e. if there length-2 path appearing 3 times, prefer on length-3 path appearing twice. same apprach can used different starting length, e.g. requiring @ least length-3 paths, without changing basic algorithm or complexity.)

here's simple example implementation in python demonstrate algorithm. goes length-3, extended length-4 , beyond loop. also, not check edge-cases (array-out-of-bounds etc.)

# example data data = [[1,2,  4,5,6,7,  9],         [1,2,3,4,5,6,  8,9],         [1,2,  4,5,6,7,8  ]]  # step one: count how , each pair appears collections import defaultdict pairs = defaultdict(list) i, lst in enumerate(data):     k, pair in enumerate(zip(lst, lst[1:])):         pairs[pair].append((i,k))  # step two: find common pair , filter = max([len(lst) lst in pairs.values()]) pairs = {k: v k, v in pairs.items() if len(v) == most} print(pairs) # {(1, 2): [(0, 0), (1, 0), (2, 0)], (4, 5): [(0, 2), (1, 3), (2, 2)], (5, 6): [(0, 3), (1, 4), (2, 3)]}  # step three: expand pairs triplets, triplets quadruples, etc. triples = [k + (data[v[0][0]][v[0][1]+2],)             k, v in pairs.items()             if len(set(data[i][k+2] (i,k) in v)) == 1] print(triples) # [(4, 5, 6)]

Search This Blog

Insert

algorithm - Most common subpath in a collection of paths -

Comments

Post a Comment

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -