mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

Home Page:https://localai.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to use BAAI/bge-reranker-base model for reranking

shizidushu opened this issue · comments

LocalAI version:
localai/localai:v2.16.0-cublas-cuda12

Environment, CPU architecture, OS, and Version:
Linux LAPTOP-LENOVO 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
It tries to load mixedbread-ai/mxbai-rerank-base-v1 instead.

To Reproduce
Use the following yaml.

name: bge-reranker-base
backend: rerankers

parameters:
  model: cross-encoder
  pipeline_type: zh

After read https://github.com/AnswerDotAI/rerankers/blob/e53b2714a935937561c9045326ed19bfa5082129/rerankers/reranker.py#L12-L16 and

kwargs['lang'] = request.PipelineType
, I think set pipeline_type: zh will make BAAI/bge-reranker-base to be used. But it loads mixedbread-ai/mxbai-rerank-base-v1 instead.

Logs
Here is the output from docker compose up

localai-api-1  | 8:48AM DBG Extracting backend assets files to /tmp/localai/backend_data
localai-api-1  | 8:48AM DBG processing api keys runtime update
localai-api-1  | 8:48AM DBG processing external_backends.json
localai-api-1  | 8:48AM DBG external backends loaded from external_backends.json
localai-api-1  | 8:48AM INF core/startup process completed!
localai-api-1  | 8:48AM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
localai-api-1  | 8:48AM DBG No configuration file found at /tmp/localai/config/assistants.json
localai-api-1  | 8:48AM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
localai-api-1  | 8:48AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
localai-api-1  | 8:48AM DBG Request for model: cross-encoder
localai-api-1  | 8:48AM INF Loading model 'cross-encoder' with backend rerankers
localai-api-1  | 8:48AM DBG Loading model in memory from file: /models/cross-encoder
localai-api-1  | 8:48AM DBG Loading Model cross-encoder with gRPC (file: /models/cross-encoder) (backend: rerankers): {backendString:rerankers model:cross-encoder threads:0 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002006c8 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-api-1  | 8:48AM DBG Loading external backend: /build/backend/python/rerankers/run.sh
localai-api-1  | 8:48AM DBG Loading GRPC Process: /build/backend/python/rerankers/run.sh
localai-api-1  | 8:48AM DBG GRPC Service for cross-encoder will be running at: '127.0.0.1:35699'
localai-api-1  | 8:48AM DBG GRPC Service state dir: /tmp/go-processmanager1405612238
localai-api-1  | 8:48AM DBG GRPC Service Started
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stdout Initializing libbackend for build
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stdout virtualenv activated
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stdout activated virtualenv has been ensured
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stderr /build/backend/python/rerankers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stderr   warnings.warn(
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stderr Server started. Listening on: 127.0.0.1:35699
localai-api-1  | 8:48AM DBG GRPC Service Ready
localai-api-1  | 8:48AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:cross-encoder ContextSize:512 Seed:2117973773 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/cross-encoder Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stderr /build/backend/python/rerankers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
localai-api-1  | 8:48AM DBG GRPC(cross-encoder-127.0.0.1:35699): stderr   warnings.warn(
localai-api-1  | 8:48AM ERR Server error error="could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like mixedbread-ai/mxbai-rerank-base-v1 is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=<class 'OSError'>" ip=172.23.0.1 latency=22.039144583s method=POST status=500 url=/v1/rerank
localai-api-1  | 8:49AM INF Success ip=127.0.0.1 latency="31.413µs" method=GET status=200 url=/readyz

Additional context