kerthcet / llmlite

A library helps to communicate with all kinds of LLMs consistently.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

llmlite

Latest Release

๐ŸŒต llmlite is a library helps to communicate with all kinds of LLMs consistently.

Features

  • State-of-the-art LLMs support
  • Continuous Batching via vLLM
  • Quantization([issue#37] (InftyAI#37))
  • Loading specific adapters (issue#51)
  • Streaming (issue#52)

Model Support

Model State System Prompt Note
ChatGPT Done โœ… Yes
Llama-2 Done โœ… Yes
CodeLlama Done โœ… Yes
ChatGLM2 Done โœ… No
Baichuan2 Done โœ… Yes
ChatGLM3 WIP โณ Yes
Claude-2 RoadMap ๐Ÿ“‹ issue#7
Falcon RoadMap ๐Ÿ“‹ issue#8
StableLM RoadMap ๐Ÿ“‹ issue#11

Backend Support

backend State
huggingface Done โœ…
vLLM Done โœ…

How to install

pip install llmlite==0.0.15

How to use

Chat

from llmlite import ChatLLM, ChatMessage

chat = ChatLLM(
    model_name_or_path="meta-llama/Llama-2-7b-chat-hf", # required
    task="text-generation",
    )

result = chat.completion(
  messages=[
    ChatMessage(role="system", content="You're a honest assistant."),
    ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
  ]
)

# Output: Oh my goodness, a llama in your garden?! ๐Ÿ˜ฑ That's quite a surprise! ๐Ÿ˜… As an honest assistant, I must inform you that llamas are not typically known for their gardening skills, so it's possible that the llama in your garden may have wandered there accidentally or is seeking shelter. ๐Ÿฎ ...

Continuous Batching

This is mostly supported by vLLM, you can enable this by configuring the backend.

from llmlite import ChatLLM, ChatMessage

chat = ChatLLM(
    model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
    backend="vllm",
)

results = chat.completion(
    messages=[
        [
            ChatMessage(role="system", content="You're a honest assistant."),
            ChatMessage( role="user", content="There's a llama in my garden, what should I do?"),
        ],
        [
            ChatMessage(role="user", content="What's the population of the world?"),
        ],
    ],
    max_tokens=2048,
)

for result in results:
    print(f"RESULT: \n{result}\n\n")

llmlite also supports other parameters like temperature, max_length, do_sample, top_k, top_p to help control the length, randomness and diversity of the generated text.

See examples for reference.

Prompting

You can use llmlite to help you generate full prompts, for instance:

from llmlite import ChatLLM

messages = [
    ChatMessage(role="system", content="You're a honest assistant."),
    ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
]

ChatLLM.prompt("meta-llama/Llama-2-7b-chat-hf", messages)

# Output:
# <s>[INST] <<SYS>>
# You're a honest assistant.
# <</SYS>>

# There's a llama in my garden, what should I do? [/INST]

Logging

Set the env variable LOG_LEVEL for log configuration, default to INFO, others like DEBUG, INFO, WARNING etc..

Contributions

๐Ÿš€ All kinds of contributions are welcomed ! Please follow Contributing.

Contributors

๐ŸŽ‰ Thanks to all these contributors.

About

A library helps to communicate with all kinds of LLMs consistently.

License:MIT License


Languages

Language:Python 98.6%Language:Makefile 1.4%