Yannael / multilingual-embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenAI vs open-source multilingual embeddings models

This noteboook provides example code to assess which embedding model works best for your data. The example task is a retrieval task (as in RAG - retrieval augmented generation), on multilingual data. See associated Medium article here.

The data source is based on the European AI Act, and models cover some of the latest OpenAI and open-source embeddings models (as of 02/2024) to deal with multilingual data:

OpenAI released two models in January 2024:

  • text-embedding-3-small (released 25/01/2024)
  • text-embedding-3-large (released 25/01/2024)

We compare with the following open-source models

About


Languages

Language:Jupyter Notebook 100.0%