mpwang / llama-cpp-windows-guide

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to run llama.cpp on windows

I'm not familiar with windows development, here are just something I wish can help.

please refer to llama.cpp


  • Visual Studio Community installed with Desktop C++ Environment selected during installation
  • Chocolatey (a package manager for Windows) installed
  • CMake installed
  • Python 3 installed
  • LLaMA models downloaded (dalai can help)


install make

Install Make Open PowerShell as an administrator and run the following command:

choco install make


if python is not installed, you can install python via choco

choco install python

clone llama.cpp

Clone repository using Git or download the repository as a ZIP file and extract it to a directory on your machine.


build llama.cpp

Use Visual Studio to open llama.cpp directory.

Select "View" and then "Terminal" to open a command prompt within Visual Studio. Type the following commands:

cmake .

On the right hand side panel:

right click file quantize.vcxproj -> select build
this output .\Debug\quantize.exe

right click ALL_BUILD.vcxproj -> select build
this output .\Debug\llama.exe

create a python virtual environment

back to the powershell termimal, cd to lldma.cpp directory, suppose LLaMA models have been download to models directory

python -m venv venv

.\venv\Scripts\pip.exe install torch torchvision torchaudio sentencepiece numpy

.\venv\Scripts\python.exe models/7B/ 1

.\Debug\quantize.exe ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

.\Debug\llama.exe -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
