Lemur: The State-of-the-art Open Pretrained Large Language Models Balancing Text and Code Capabilities

Open large language models (LLMs) have traditionally been tailored for either textual or code-related tasks, with limited ability to effectively balance both. However, many complex language applications, particularly language model agents, demand systems with a multifaceted skill set encompassing understanding, reasoning, planning, coding, and context grounding.

In this work, we introduce Lemur-70B-v1 and Lemur-70B-chat-v1, the state-of-the-art open pretrained and supervised fine-tuned large language models balancing text and code intelligence.

This release includes model weights and starting code for using Lemur models, and we will continue to update more models and code.

This repository is a minimal example to load Lemur models, run inference, and be initialized for further finetuning. Check huggingface for a more detailed usage recipe.

🔥 News

[23 August, 2023]: 🎉 We release the weights of OpenLemur/lemur-70b-v1, and OpenLemur/lemur-70b-chat-v1! Check it out in HuggingFace Hub.

Quickstart
Acknowledgements

Quickstart

Setup

First, we have to install all the libraries listed in requirements.txt

pip install -r requirements.txt

Models

Model cards are published on the HuggingFace Hub:

Inference

You can do inference like this:

from transformers import AutoTokenizer, AutoModelForCausalLM

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("OpenLemur/lemur-70b-v1")
model = AutoModelForCausalLM.from_pretrained("OpenLemur/lemur-70b-v1", device_map="auto", load_in_8bit=True)

prompt = "Your prompt here"
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Pretrained Model

The model is initialized from LLaMa-2 70B and further trained on ~100B text and code data (more details coming soon!). They should be prompted so that the expected answer is the natural continuation of the prompt.

Here is a simple example of using our pretrained model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OpenLemur/lemur-70b-v1")
model = AutoModelForCausalLM.from_pretrained("OpenLemur/lemur-70b-v1", device_map="auto", load_in_8bit=True)

# Text Generation Example
prompt = "The world is "
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

# Code Generation Example
prompt = """
def factorial(n):
    if n == 0:
        return 1
"""
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=200, num_return_sequences=1)
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_code)

Supervised Fine-tuned Model

The model is initialized from lemur-70b-v1 and continues trained on supervised fine-tuning data.

Here is a simple example to use our supervised fine-tuned model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OpenLemur/lemur-70b-chat-v1")
model = AutoModelForCausalLM.from_pretrained("OpenLemur/lemur-70b-chat-v1", device_map="auto", load_in_8bit=True)

# Text Generation Example
prompt = "What's lemur's favorite fruit?"
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

# Code Generation Example
prompt = "Write a Python function to merge two sorted lists into one sorted list without using any built-in sort functions."
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=200, num_return_sequences=1)
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_code)

Acknowledgements

The Lemur project is an open collaborative research effort between XLang Lab and Salesforce Research. We thank the following institutions for their gift support:

yeshsurya / Lemur