Elasticsearch - Looking for a performant way of must_not with ids -


i've following situation:

we've product search realized commercial solution. i'm playing around elasticsearch realize our current product search elasticsearch works basically. we've 1 speciality. we've product catalog of 1 million products, not every customer allowed buy every product. there many rules defining if product can bought customer.

it's not just:

customer not allowed buy products of vendor a

or:

customer b not allowed buy products of category b of vendor b.

that easy.

to these products customer not allowed buy implemented microservice/webservice years ago. webservice returns product blacklist, list of product numbers.

the problem is, if run query in elasticsearch ignoring these blacklisted products products customer not allowed buy. if query top 10 search hits happen, i'm not allowed show these products, because customer not allowed buy them. if i'm using aggregations vendors , categories vendors and/or categories customer not allowed buy from.

what did in prototype?

before querying elasticsearch request product blacklist customer (and cache of course). after i've received blacklist run query this:

{   "query" : {     "bool" : {       "must_not" : [         {           "ids" : {             "values" : [                // numbers of blacklisted products. can thousands!                1234567,               1234568,               1234569,               1234570,               ...             ]           }         }       ],       "should" : [         {         "query" : {             ...           }         ]       }     }   }   "aggregations" : {     ...   } } 

this works well, we've customers have thousands of blacklisted products. therefore on 1 hand i'm afraid network traffic high , recognized complete elasticsearch request remarkably slower. depends on amount of black listed products.

my next approach develop own elasticsearch query builder plugin, handles blacklist stuff inside of elasticsearch. blacklist query extends abstractquerybuilder , uses terminsetquery. query builder requests blacklist of given customer once, caches it, , builds terminsetquery blacklisted product numbers.

now query this:

{   "query" : {     "bool" : {       "must_not" : [         {           "blacklist" : {         <-- own query builder             "customer" : 1234567           }         }       ],       "should" : [         {         "query" : {             ...           }         ]       }     }   }   "aggregations" : {     ...   } } 

this solutation faster , doesn't have send whole list of blacklisted product numbers in query each time. don't have network overhead. query still remarkably slower without blacklist stuff. profiled query , i'm not suprized see, blacklist query takes 80-90% of runtime.

i think terminsetquery performs bad in case. because guess elasticsearch respective lucene matching process quite more a:

if (blacklistset.contains(id)) {   continue; // ignore current search hit. } 

does of have hint me, how implement such blacklist mechanism more performant?

is there way intercept elasticsearch/lucene query process? maybe can write own real lucene query instead of using terminsetquery.

thanks in advance.

christian

this not solution, maybe approach.

first of all, here older post might interest you. far know, more recent versions of elasticsearch did not introduce/change better or more suitable.

if follow link of answer terms query documentation page, find simple example.

now, instead of caching blacklists, create index , store blacklist each customer. can use terms query, , reference blacklist other index (=your blacklist cache).

i don't know frequency of updates on these blacklists, maybe issue. also, you'd have careful not out of sync. worth mentioning fact, index inserts/updates default not visible. might need force refresh when indexing/updating blacklists.

as said, may not solution. if sounds feasible you, may worth try compare other solutions.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -