🥷 MuLan: A Study of Fact Mutability in Language Models

Setup

conda create -n fact_mutability
conda activate fact_mutability
pip install torch==2.0.0
pip install git+https://github.com/huggingface/transformers
conda install matplotlib

# (Optional) For the iPython kernel
conda install -c anaconda ipykernel
python -m ipykernel install --user --name=fact_mutability

Data

🥷 MuLan queries: https://huggingface.co/datasets/coastalcph/fm_queries

Aliases: https://huggingface.co/datasets/coastalcph/fm_aliases

Run Experiments

Correctness and confidence of predictions

inference.py: This code passes a set of queries (one query per line) through a language model and stores the model's predictions and softmax scores in predictions.json. Uses greedy beam search to generate predictions, options to select how many beams and instructions.

evaluation.py: SQUAD-style F1-score evaluation, where the user specifies whether to select the best prediction based on perplexity of first token score (prediction_mode). It uses the predictions generated by running inference.py, the queries and aliases datasets (the best match among the possible answers and its aliases is used).

Probe classifier

classifier/mdl_classifier.py: Code to train the classifiers for MDL computation. classifier/compute_mdl.py: Script to compute MDL results using the outputs from the mdl_classifier code. classifier/classifier_eval.py: Inference on a given set of relations.

Updates

inference_updates.py: Code to compute the effectiveness of in-context updates for each mutability type.

coastalcph / fact_mutability