TrelisResearch / code-llama-32k

Run code-llama with 50k tokens using flash attention and better transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

code-llama-32k

Run code-llama with 32k tokens using flash attention and better transformer

Basic Jupyter Notebook (only works on Nvidia GPUs, not Mac).

Option 1 - Google Colab:

  • Download the ipynb notebook
  • Select a GPU
    • A100 with 40 GB will allow for 25k context length

Option 2 - Run on a server (e.g. AWS or RunPod (affiliate link))

  • Spin up an A100 80 GB server
  • Run the notebook and select 50,000 context length

PRO Notebooks

  • Allows for saving and re-loading of conversations
  • Allows for uploading and analysis of documents
  • Works on Google Colab or on a Server (e.g. AWS, Azure, RunPod)
  • Purchase here

About

Run code-llama with 50k tokens using flash attention and better transformer

License:MIT License


Languages

Language:Jupyter Notebook 100.0%