mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

Home Page:https://mit-han-lab.github.io/TinyChatEngine/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows CUDA Make chat problem

M0rtale opened this issue · comments

I am trying to use this solution on windows with CUDA with capability 8.6. I am running into an issue relating to the function LLaVAGenerate not being compiled during linking, as shown in the screenshot below
image

Steps to replicate:

Environment:

  • Visual Studio 2022 14.29.30133's cl.exe
  • CUDA Took kit 12
  • LLaMA2 13B AWQ int4 model using command python tools/download_model.py --model LLaMA2_13B_chat_awq_int4 --QM QM_CUDA
  • pthread package from vcpkg by directly linking the include and lib files in the project
  • PATH: /c/CUDA/v12/libnvvp:/c/CUDA/v12/bin:/c/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64:/ucrt64/bin:/usr/local/bin:/usr/bin:/bin:/c/Windows/System32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0/:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl

Fixed a few issues with NUM_THREAD not being defined and tanhf not being defined, built using command make chat -j

my guess the only reference to LLaVAGenerate is in the non_cuda directory, maybe it is being omitted from compiled? Note compiling with CPU flag works fine and I can get output from the LLM

image