loops - Running top k elements in a sorted, fixed size list / python -
i trying keep list of top k elements of large set of tuples. since keeping in memory impossible, want use fixed size list keep top k values (with keys). have tried use min heap python's heap terrible lets non unique keys inserted. huge problem. thought can use sorted list/dict instead (tuples unique keys). using sketch function retrieve number of counts substring has appeared in whole text (o(1) time)). beginning think wrong loops or pops , assignments, because minheap has similar problem top k shows in 25 size list, , rest rather low counts (when in fact higher )
for line in lines[1::4]: startidx = 0 while startidx + k <= (len(line)-k): kmer = line[startidx:(startidx+k)] count = randint(1, 250) if count > 2: if len(tdict.keys()) < topcount: tdict[km] = count else: kmin = (sorted(tdict,reverse = false, key=lambda x: x[1])) if count > tdict[kmin[0]]: topkmerdict.pop(kmin[0]) topkmerdict[km] = count startidx += 1 linesprocessed += 1
please try changing line:
kmermin = (sorted(topkmerdict,reverse = false, key=lambda x: x[1])) to:
kmermin = (sorted(topkmerdict,reverse = false) the previous line sorting on second character of string key values.
Comments
Post a Comment