Sentence Embedding Benchmark

Installation

Because we are downloading and using a lot of data from huggingface, you need to create a .env file and add you huggingface token, otherwise downloading the required models and datasets might fail.
```
HF_TOKEN="<your_huggingface_token>"
```
To run the benchmark, select the task you want to run, e.g. clustering_benchmarks.ipynb and run all cells.

Display the results and create all associated plots by running the plot notebook, e.g. clustering_plots.ipynb. The available task and plots are:

Task	Plot
`clustering_benchmarks.ipynb`	`clustering_plots.ipynb`
`clustering_benchmarks_cutoff.ipynb`	`clustering_plots_cutoff.ipynb`
`retrieval_benchmark_cqa.ipynb`	`retrieval_plots.ipynb`
`retrieval_benchmark_nqa_chunking.ipynb`	`retrieval_plots.ipynb`
`retrieval_benchmark_nqa_seq.ipynb`	`retrieval_plots.ipynb`
`sts_benchmarks.ipynb`	`sts_plots.ipynb`

Analyzing the impact of sentence length on embedding model performance

Language:Jupyter Notebook 100.0%