embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Home Page:https://arxiv.org/abs/2210.07316

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Convert SIB200Classification to SIB200Clustering

KennethEnevoldsen opened this issue · comments

We currently have quite a reasonable coverage on Classification tasks in MTEB, however only have a subpar performance on clustering. I would probably suggest that we convert SIB200Classification to a clustering task. @x-tabdeveloping what are your thoughts here? @jankounchained might be something that you could do? (we should probably combine train/val/test into one dataset).

Sounds very reasonable to me. Topics are better suited for clustering than classification in my opinion anyway.

Just opened a PR ☝️ if anyone wanted to take a look.