Karn / llama-droid

PoC for running alpaca.cpp for on-device, offline, inference on Android

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLama-Droid

PoC for running Alpaca on-device. That uses a modified version of alpaca.cpp to perform inference.

Requirements

  • A device with a >8GB storage, and enough RAM to load the model into memory (>6GB). Download ggml-alpaca-7b-q4.bin, rename it to alpaca7b.bin and upload it to the device in the Application Files Directory under models.
  • Time, since on-device inference is currently incredibly slow.

Future work

  • What does GPU accelerated inference look like?
  • mmap? How, and does it make a difference.
  • Consider using a smaller model to improve speed.

About

PoC for running alpaca.cpp for on-device, offline, inference on Android


Languages

Language:C 79.4%Language:C++ 16.0%Language:Kotlin 1.6%Language:Python 1.2%Language:Makefile 1.2%Language:CMake 0.6%Language:Shell 0.1%