calhounpaul / GPT-NeoX-20B-8bit-inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo contains a notebook that can be used to run inference on a GPT-NeoX-20B model via pretrained weights downloaded from EleutherAI, using Facebook's Bitsandbytes + Huggingface's Accelerate modules to reduce the size of the model and split the inference load between GPUs in multi-GPU setups. It is designed to run locally on two RTX3090s with >=60GB of CPU RAM. It should also run on Colab Pro without modification.

This notebook demos sentiment analysis, summarization, keyword extraction, and conversational chat:

index

About

License:MIT License


Languages

Language:Jupyter Notebook 100.0%