The Wayback Machine - https://web.archive.org/web/20210102141803/https://github.com/shmsi/document-ranking
Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Document Ranking With Word Embeddigns

Traditional document ranking methods can only retrieve documents which contains only the query words (Ex: Smart and intelligence is compeletely different words. you cannot find documents containing intellgence when you search for smart). However, a document could be relevant to a query without any common words. In this method word embeddings, keyword extraction and TFIDF documet ranking are combined in order to avoid such cases. The method contains two steps. in the first step, teh documents and query are enriched which is described by the figure below. In the next step the cosine similarity is computed between TFIDF vectors of enriched query and document pairs and top N similar documents are retrieved.

Enrichment

You can read more about enrichment procedure in chapter 3.3 of the following document. Topic Modeling and Clustering for Analysis of Road Traffic Accidents

Dependencies

Python 2.7.12

gensim

scikit learn

About

Document ranking word embeddings

Topics

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.