SciPhi-AI / R2R

The Supabase for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.

Home Page:https://r2r-docs.sciphi.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failure on data ingestion into "qdrant" using "text-embedding-ada-002" embedding

avibathula opened this issue · comments

Describe the bug
Failure on data ingestion into qdrant using text-embedding-ada-002 embedding

BadRequestError: Error code: 400 - {'error': {'message': 'This model does not support specifying dimensions.', 'type': 'invalid_request_error',
'param': None, 'code': None}}

The issue seems to be in OpenAIEmbeddingProvider.get_embedding method in r2r/embeddings/openai/openai_base.py which is always passing in the dimensions while as per https://platform.openai.com/docs/api-reference/embeddings/create

dimensions integer Optional

  • The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

So - for "text-embedding-ada-002" embedding type, the code shouldn't send the dimensions value.

To Reproduce
Steps to reproduce the behavior:
Use a config of

{
"app": {
"max_logs": 100,
"max_file_size_in_mb": 50
},
"completions": {
"provider": "openai"
},
"embedding": {
"provider": "openai",
"search_model": "text-embedding-ada-002",
"search_dimension": 1536,
"batch_size": 128,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 512,
"chunk_overlap": 20
},
"rerank_model": "None"
},
"eval": {
"provider": "local",
"llm": {
"model": "gpt-4o",
"provider": "openai"
},
"sampling_fraction": 1.0
},
"ingestion": {
"selected_parsers": {
"csv": "default",
"docx": "default",
"html": "default",
"json": "default",
"md": "default",
"pdf": "default",
"pptx": "default",
"txt": "default",
"xlsx": "default",
"gif": "default",
"png": "default",
"jpg": "default",
"jpeg": "default",
"svg": "default"
}
},
"logging": {
"provider": "local",
"log_table": "logs",
"log_info_table": "log_info"
},
"prompt": {
"provider": "local"
},
"vector_database": {
"provider": "qdrant",
"collection_name": "blahblahblah"
}
}

and ingest any data files.

Expected behavior
Data files vectorized and uploaded to qdrant

Additional context
I installed r2r package and programmatically provided a list of files and called r2r.aingest_files for the issue to hit.

Thank you - I will investigate this issue and report back our findings.