huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models

https://huggingface.co/docs/text-embeddings-inference/quick_tour

huggingface/text-embeddings-inference Issues

CUDA_ERROR_OUT_OF_MEMORY
Closed in 4 hours3
Multi GPU usage in SageMaker Inference endpoints
Updated 2 days ago
Numerical issues with gte-large-en-v1.5
Updated 2 days ago8
rerank model large batch request only use single cpu(for tokenizer?)
Closed 5 days ago4
[cpu][python backend]crash in python backend
Closed 6 days ago3
cpu-1.5.0: TEI doesn't download all needed ONNX Files
Closed 7 days ago
total time not equals to inference + queue+ tokenizer time
Closed 7 days ago1
Support for Qdrant bm42 document encoder
Updated 11 days ago1
Images Embeddings (ex. CLIP model)
Updated 13 days ago2
Could you help with the question :unable to start container process: exec: "--model-id": executable file not found in $PATH: unknown.
Updated 13 days ago
GPU memory usage is limited
Closed 14 days ago4
how to deploy bge-reranker-v2-m3 on Text-embeddings-inference
Closed 14 days ago
How to deploy bge-reranker-v2-m3 for multiple threads？
Closed 14 days ago
Support ALTS on gRPC interface
Updated 16 days ago
Unknown variant Qwen2
Closed 17 days ago2
Support tokenized input
Updated 17 days ago2
get (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config.json) error
Closed 18 days ago1
'ptxas' died due to signal 11 (Invalid memory reference)
Updated 18 days ago3
Support gte-Qwen1.5-7B-instruct
Updated 19 days ago1
Flash attention is not installed.
Closed 20 days ago5
Dockerized text-embeddings-inference:cpu-1.0 /embed endpoint issue
Updated 20 days ago1
Support nvidia/NV-Embed-v1
Closed 20 days ago2
Adding a cache layer
Closed a month ago3
Deberta V3 not supported
Updated 22 days ago1
Model is downloaded each time I run the container
Updated 23 days ago1
Improve documentation about rerankers: which ones are supported?
Updated 24 days ago3
Model Request: long context gte models
Closed 24 days ago2
Support for e5-mistral-7b-instruct
Closed 25 days ago
error when deploy jina-embeddings-v2-base-code
Closed 25 days ago2
CPU Image: High memory usage on startup
Updated a month ago1
Add `encoding_format` support to OpenAI compatible route
Closed a month ago3
Support for jinaai/jina-embeddings-v2-base-code
Closed a month ago2
Add Environment Variable for OTLP Service Name
Closed a month ago
Support left truncation with TEI
Closed a month ago
The "payload limit" parameter seems to have no effect?
Closed a month ago2
multilingual-e5-large exported by recent sentence-transformers version cannot be loaded
Closed a month ago4
Support BAAI/bge-reranker-v2-minicpm-layerwise
Updated a month ago
Too much cpu memory consumption
Closed a month ago2
Support NER models
Closed a month ago1
Connection Error
Closed a month ago
Download the model at build time (not run time)
Closed a month ago2
Model downloads just *hang*
Closed a month ago1
Multiple Model Endpoint support
Closed a month ago1
Missing CONTRIBUTING.md
Closed a month ago2
splade is not supported for BAAI/bge-m3
Updated a month ago
Suggest supporting thenlper/gte-small-zh
Updated 2 months ago1
Mean Pooling Not Consistent with `embed_all` outputs
Closed 2 months ago1
very high cadinality metrics
Closed 2 months ago2
Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80
Updated 2 months ago1
Call for benchmark
Updated 3 months ago