riya-joshi-401 / Query-Likelihood-Retrieval-Model

Query Likelihood Retrieval Model using Jelinek-Mercer Smoothing technique.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query-Likelihood-Retrieval-Model

  • In the query likelihood retrieval model, we rank documents by the probability that the query text could be generated by the document language model.
  • We calculate the probability that we could pull the query words out of the “bucket” of words representing the document.
  • This is a model of topical relevance,in the sense that the probability of query generation is the measure of how likely it is that a document is about the same topic as the query.

Jelinek-Mercer Smoothing

  • Smoothing refers to the process of adjusting the maximum likelihood estimator to account for inaccuracy due to data sparseness.
  • Jelinek-Mercer Smoothing is a linear interpolation of the document and collection word probabilities, where the coefficient λ determines the weighing balance between the two terms
  • Linearly interpolated between document language model and the collection language model image
  • For lambda, we choose different optimal values for different queries. Experiments have shown that a small value of lambda, around 0.1, works well for long queries and a higher value around 0.7 for short queries.

Workflow

image

Data Acquisition

Methodology

image image image

Performance Metrics

image

About

Query Likelihood Retrieval Model using Jelinek-Mercer Smoothing technique.


Languages

Language:HTML 90.4%Language:Python 9.6%