LLaMA

Run sick LLM apps hyper fast on your local machine for funzies.

See it live and in action 📺

Single Model Chat
python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
Single Model Chat with GPU Offload
python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
Single Model Function Calling with GPU Offload
python -m --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
Multiple Model Load with Config
python -m --config_file config.json
Multi Modal Models
python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5

👨🏾‍💻 Author: Nick Renotte
📅 Version: 1.x
📜 License: This project is licensed under the MIT License

An end to end walkthrough of LLaMA CPP's server.

Language:Python 100.0%