pHaeusler / llama-lol

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Llama LOL

An experiment in making a funny LLM

Blog: https://philliphaeusler.com/posts/llama_lol/

Train

For this you will need a large GPU > 20 Gbs of VRAM

python3 train.py

Sample

To get new jokes run

python3 sample.py

More Data

The can scrape more data from youtube with python yt.py

Just add additional videos to the python script.

Download and build whisper for (higher quality) transcription

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
bash ./models/download-ggml-model.sh large
make

Copy the txt files into /data

Clean up the data

  • Separate jokes (comedy bits) to individual lines
  • Remove extra whisper annotations

About


Languages

Language:Python 100.0%