Alpaca-LoRA as a service

Demonstrate Alpaca-LoRA as a Chatbot service with Alpaca-LoRA and Gradio. Main features include:

enables batch inference by aggregating requests until the previous requests are finished
achieves context aware by keeping chatting history with the following string format:

f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Input: {input} # Surrounding information to AI
### Instruction: {prompt1} # First instruction/prompt given by user
### Response {response1} # First response on the first prompt by AI
### Instruction: {prompt2} # Second instruction/prompt given by user
### Response: {response2} # Second response on the first prompt by AI
....
"""

additionally provides two additional helper buttons
- continue button lets AI to finish the previously in-completed respone. It simply sens continue message to the model, and continue message gets omitted in the post processing phase.
- summarize button lets AI to summarize the conversations so far in three sentences. There might be better prompt to generate summary, and this should be explored.
provides an additional script to run various configurations to see how it affects the generation quality and speed
currently supports the following Alpaca-LoRA:
- tloen/alpaca-lora-7b: the original 7B Alpaca-LoRA checkpoint by tloen
- chansung/alpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- chansung/koalpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the Korean dataset created by KoAlpaca project by Beomi. It works for English(user) to Korean(AI) conversations.
- chansung/alpaca-lora-30b: the 30B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model

Instructions

Prerequisites

Note that the code only works Python >= 3.9

$ conda create -n alpaca-serve python=3.9
$ conda activate alpaca-serve

Install dependencies

$ pip install -r requirements.txt

Run Gradio application

$ BASE_URL=decapoda-research/llama-7b-hf
$ FINETUNED_CKPT_URL=tloen/alpaca-lora-7b
$
$ python app.py --base_url $BASE_URL --ft_ckpt_url $FINETUNED_CKPT_URL --port 6006

Screenshots

Acknowledgements

I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.

s1530129650 / Alpaca-LoRA-Serve

Alpaca-LoRA as a service

Instructions

Screenshots

Acknowledgements

About

Languages