mesolitica / embedding-benchmarks

Benchmarking Embedding models for Malaysian context.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

embedding-benchmarks

Benchmarking RAG Embedding models for Malaysian context, HuggingFace space at https://huggingface.co/spaces/mesolitica/Malaysian-Embedding-Leaderboard

📈 We evaluate models based on 2 datasets,

  1. Research paper keyword melayu using Crossref, https://huggingface.co/datasets/mesolitica/malaysian-ultrachat/resolve/main/ultrachat-crossref-melayu-malay.jsonl
  2. lom.agc.gov.my PDF files, https://huggingface.co/datasets/mesolitica/malaysian-ultrachat/resolve/main/ultrachat-lom-agc.jsonl

About

Benchmarking Embedding models for Malaysian context.


Languages

Language:Jupyter Notebook 100.0%