sujitvasanth/streaming-LLM-chat

This is a transformers library application that allows you to choose a local LLM and run streaming inference on GPU.

it uses:

the models are assumed to be in oogabooga textgeneration ui folder

the openchat model is available at https://huggingface.co/

TheBloke/openchat-3.5-0106-GPTQ

sujitvasanth/TheBloke-openchat-3.5-0106-GPTQ

About

transformers based streaming chat for GPTQ models

Language:Python 100.0%