loops - Running top k elements in a sorted, fixed size list / python -


i trying keep list of top k elements of large set of tuples. since keeping in memory impossible, want use fixed size list keep top k values (with keys). have tried use min heap python's heap terrible lets non unique keys inserted. huge problem. thought can use sorted list/dict instead (tuples unique keys). using sketch function retrieve number of counts substring has appeared in whole text (o(1) time)). beginning think wrong loops or pops , assignments, because minheap has similar problem top k shows in 25 size list, , rest rather low counts (when in fact higher )

for line in lines[1::4]:      startidx = 0     while startidx + k <= (len(line)-k):         kmer = line[startidx:(startidx+k)]         count = randint(1, 250)            if count > 2:            if len(tdict.keys()) < topcount:                  tdict[km] = count            else:                 kmin = (sorted(tdict,reverse = false, key=lambda x: x[1]))                 if count > tdict[kmin[0]]:                        topkmerdict.pop(kmin[0])                      topkmerdict[km] = count         startidx += 1      linesprocessed += 1 

please try changing line:

  kmermin = (sorted(topkmerdict,reverse = false, key=lambda x: x[1])) 

to:

  kmermin = (sorted(topkmerdict,reverse = false) 

the previous line sorting on second character of string key values.


Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

Optimising Firebase database by automatically overwriting data -

javascript - Angular UI-Grid customTemplate directive causing rows to load slowly/? -