TrelisResearch / code-llama-32k

Run code-llama with 50k tokens using flash attention and better transformer

code-llama-32k

Run code-llama with 32k tokens using flash attention and better transformer

Basic Jupyter Notebook (only works on Nvidia GPUs, not Mac).

Option 1 - Google Colab:

Download the ipynb notebook
Select a GPU
- A100 with 40 GB will allow for 25k context length

Option 2 - Run on a server (e.g. AWS or RunPod (affiliate link))

Spin up an A100 80 GB server
Run the notebook and select 50,000 context length

PRO Notebooks

Allows for saving and re-loading of conversations
Allows for uploading and analysis of documents
Works on Google Colab or on a Server (e.g. AWS, Azure, RunPod)
Purchase here

About

Run code-llama with 50k tokens using flash attention and better transformer

MIT License

Languages

Language:Jupyter Notebook 100.0%