How to use solr to calculate the pagerank of a node? -


i index wikipedia dump file solr format:

<page>     <title>bruce willis</title>     <ns>0</ns>     <id>64673</id>     <revision>       <id>789709463</id>       <parentid>789690745</parentid>       <timestamp>2017-07-09t02:27:39z</timestamp>       <contributor>         <username>materialscientist</username>         <id>7852030</id>       </contributor>       <comment>imdb not reliable source</comment>       <model>wikitext</model>       <format>text/x-wiki</format>       <text xml:space="preserve" bytes="57375">{{use mdy dates|date=march 2012}} {{infobox person  | name = bruce willis  | image = bruce willis gage skidmore.jpg  | caption = willis @ 2010 [[san diego comic-con]].  | birth_name = walter bruce willis  | birth_date = {{birth date , age|1955|3|19}}  |   | birth_place = [[idar-oberstein]], west germany  | nationality = [[american people|american]]  | residence = [[los angeles]], [[california]], u.s. 

and schema file of core:

<fieldtype name="string" class="solr.strfield"/>     <fieldtype name="date" class="solr.triedatefield" precisionstep="0" positionincrementgap="0"/>     <fieldtype name="int" class="solr.trieintfield" precisionstep="0" positionincrementgap="0"/>     <fieldtype name="float" class="solr.triefloatfield" precisionstep="0" positionincrementgap="0"/>     <fieldtype name="long" class="solr.trielongfield" precisionstep="0" positionincrementgap="0"/>     <fieldtype name="double" class="solr.triedoublefield" precisionstep="0" positionincrementgap="0"/>      <field name="id" type="string" indexed="true" stored="true" required="true"/>     <field name="_version_" type="long" indexed="true" stored="true"/>     <field name="title" type="text_wiki" indexed="true" stored="true" termvectors="true" termpositions="true" termoffsets="true" />     <field name="revision_text" type="text_wiki" indexed="true" stored="true" multivalued="true" termvectors="true" termpositions="true" termoffsets="true" />     <field name="revision_timestamp" type="date" indexed="true" stored="true" multivalued="true" />     <field name="contributor_id" type="int" indexed="true" stored="true" multivalued="true" />     <field name="contributor_username" type="string" indexed="true" docvalues="true" stored="true" multivalued="true" />      <dynamicfield name="*" type="string" indexed="true" stored="true" multivalued="true"/>     <uniquekey>id</uniquekey> 

i did not post content of schema.xml. know can use solr score or similarity. similarity calculated based on (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldlength / avgfieldlength)). think page rank based on number of incoming , outgoing pages. typefield cannot retrieve incoming , outgoing pages.

so don't know how calculate pagerank using solr. did understand wrong? give me advice if know how this? thanks

depending on how advanced want pagerank be. if want consider number of inbound links, can calculate extracting list of pages page links when indexing. iterate on stored pages , select count of documents link page you're looking at, storing new field number of documents link page. sort score (or use boosting, etc.) affect list of results returned.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -