Failure on data ingestion into "qdrant" using "text-embedding-ada-002" embedding
avibathula opened this issue · comments
Describe the bug
Failure on data ingestion into qdrant using text-embedding-ada-002 embedding
BadRequestError: Error code: 400 - {'error': {'message': 'This model does not support specifying dimensions.', 'type': 'invalid_request_error',
'param': None, 'code': None}}
The issue seems to be in OpenAIEmbeddingProvider.get_embedding
method in r2r/embeddings/openai/openai_base.py
which is always passing in the dimensions while as per https://platform.openai.com/docs/api-reference/embeddings/create
dimensions
integer Optional
- The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
So - for "text-embedding-ada-002" embedding type, the code shouldn't send the dimensions value.
To Reproduce
Steps to reproduce the behavior:
Use a config of
{
"app": {
"max_logs": 100,
"max_file_size_in_mb": 50
},
"completions": {
"provider": "openai"
},
"embedding": {
"provider": "openai",
"search_model": "text-embedding-ada-002",
"search_dimension": 1536,
"batch_size": 128,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 512,
"chunk_overlap": 20
},
"rerank_model": "None"
},
"eval": {
"provider": "local",
"llm": {
"model": "gpt-4o",
"provider": "openai"
},
"sampling_fraction": 1.0
},
"ingestion": {
"selected_parsers": {
"csv": "default",
"docx": "default",
"html": "default",
"json": "default",
"md": "default",
"pdf": "default",
"pptx": "default",
"txt": "default",
"xlsx": "default",
"gif": "default",
"png": "default",
"jpg": "default",
"jpeg": "default",
"svg": "default"
}
},
"logging": {
"provider": "local",
"log_table": "logs",
"log_info_table": "log_info"
},
"prompt": {
"provider": "local"
},
"vector_database": {
"provider": "qdrant",
"collection_name": "blahblahblah"
}
}
and ingest any data files.
Expected behavior
Data files vectorized and uploaded to qdrant
Additional context
I installed r2r
package and programmatically provided a list of files and called r2r.aingest_files
for the issue to hit.
Thank you - I will investigate this issue and report back our findings.