Demonstrate Alpaca-LoRA as a Chatbot service with Alpaca-LoRA and Gradio. Main features include:
- enables batch inference by aggregating requests until the previous requests are finished
- achieves context aware by keeping chatting history with the following string format:
f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Input: {input} # Surrounding information to AI
### Instruction: {prompt1} # First instruction/prompt given by user
### Response {response1} # First response on the first prompt by AI
### Instruction: {prompt2} # Second instruction/prompt given by user
### Response: {response2} # Second response on the first prompt by AI
....
"""
- additionally provides two additional helper buttons
continue
button lets AI to finish the previously in-completed respone. It simply senscontinue
message to the model, andcontinue
message gets omitted in the post processing phase.summarize
button lets AI to summarize the conversations so far in three sentences. There might be better prompt to generate summary, and this should be explored.
- provides an additional script to run various configurations to see how it affects the generation quality and speed
- currently supports the following Alpaca-LoRA:
- tloen/alpaca-lora-7b: the original 7B Alpaca-LoRA checkpoint by tloen
- chansung/alpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- chansung/koalpaca-lora-13b: the 13B Alpaca-LoRA checkpoint by myself(chansung) with the Korean dataset created by KoAlpaca project by Beomi. It works for English(user) to Korean(AI) conversations.
- chansung/alpaca-lora-30b: the 30B Alpaca-LoRA checkpoint by myself(chansung) with the same script to tune the original 7B model
- Prerequisites
Note that the code only works Python >= 3.9
$ conda create -n alpaca-serve python=3.9
$ conda activate alpaca-serve
- Install dependencies
$ pip install -r requirements.txt
- Run Gradio application
$ BASE_URL=decapoda-research/llama-7b-hf
$ FINETUNED_CKPT_URL=tloen/alpaca-lora-7b
$
$ python app.py --base_url $BASE_URL --ft_ckpt_url $FINETUNED_CKPT_URL --port 6006
I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.