Unsupervised Semantic Retrieval via Mutual Information Estimation

This is a PyTorch implementation of the paper Unsupervised Semantic Retrieval via Mutual Information Estimation.

Requirements

python=3.9.13
numpy=1.23.1
tqdm=4.64.1
pytorch=1.12.1
transformers=4.18.0
tensorflow=2.8.2
tensorflow-hub=0.12.0
sentence-transformers=2.2.2
json=2.0.9

Preprocess

Arrange datasets in json files as following form:

[
  {
    "id": "<id for sample>", 
    "query": "<query text>",
    "candidates": [
      {
          "cid": "<id for candidate>",
          "order": "<rank sequence>",
          "label": "<0 or 1>",
          "subject": "<Subject of the candidate or empty>",
          "body": "<Body of the candidate>"
      }, ...
    ]
  }, ...
]

Place datasets in the data folder as described in preprocess.py/CORPUS2PATH

formalize the datasets by:

python preprocess.py -fc <corpus name> -dd   <dump path> -dc -dqac

Calculate the results of source domain function by:

python preprocess.py -cm <source domain   function type> -c <corpus name> -mp <model   path>

Calculate the metrics for the results of source domain function by:

python preprocess.py -cmx <MAP, MRR or F1> -pp   <path of results> -cr <cal range>

Training

Prepare a configuration file like the sample in conf/wikiqa/wikiqa.yaml

Train our model by:

python USRMIE.py -t <qa or wsd> -cp <path to   configuration file> -dtr -dte -g <gpus to use>   -r <local rank> -wf <path to save tensorboard   information> -s <seed>

About

A PyTorch implementation of the paper "Unsupervised Semantic Retrieval via Mutual Information Estimation"

MIT License

Languages

Language:Python 100.0%