DBRX is a large language model trained by Databricks, and made available under an open license. This repository contains the minimal code and examples to run inference, as well as a collection of resources and links for using DBRX.
- Founder's Blog, DBRX Technical Blog
- HuggingFace: https://huggingface.co/collections/databricks/
- LLM Foundry: https://github.com/mosaicml/llm-foundry
A reference model code can be found in this repository at modeling_dbrx.py.
Note: this model code is supplied for references purposes only, please see the HuggingFace repository for the official supported version.
DBRX is a Mixture-of-Experts (MoE) model with 132B total parameters and 36B live parameters. We use 16 experts, of which 4 are active during training or inference. DBRX was pre-trained for 12T tokens of text. DBRX has a context length of 32K tokens.
The following models are open-sourced:
Model | Description |
---|---|
DBRX Base | Pre-trained base model |
DBRX Instruct | Finetuned model for instruction following |
The model was trained using optimized versions of our open source libraries Composer, LLM Foundry, MegaBlocks and Streaming.
For the instruct model, we used the ChatML format. Please see the DBRX Instruct model card for more information on this.
To download the weights and tokenizer, please first visit the DBRX HuggingFace page and accept the license. Note: access to the Base model requires manual approval.
We recommend having at least 320GB of memory to run the model.
Then, run:
pip install -r requirements.txt # Or requirements-gpu.txt to use flash attention on GPU(s)
huggingface-cli login # Add your Hugging Face token in order to access the model
python generate.py # See generate.py to change the prompt and other settings
For more advanced usage, please see LLM Foundry (chat script, batch generation script)
If you have any package installation issues, we recommend using our Docker image: mosaicml/llm-foundry:2.2.1_cu121_flash2-latest
Both TensorRT-LLM and vLLM can be used to run optimized inference with DBRX. We have tested both libraries on NVIDIA A100 and H100 systems. To run inference with 16-bit precision, a minimum of 4 x 80GB multi-GPU system is required.
DBRX support is being added to TensorRT-LLM library: Pending PR
After merging, instructions to build and run DBRX TensorRT engines will be found at: README
Please see the vLLM docs for instructions on how to run DBRX with the vLLM engine.
An example script to finetune DBRX can be found in our open source library LLM Foundry
The model cards can be found at:
DBRX is available on the Databricks platform through:
The same tools used to train high quality MoE models such as DBRX are available for Databricks customers. Please reach out to us at https://www.databricks.com/company/contact if you are interested in pre-training, finetuning, or deploying your own DBRX models!
For issues with model output, or community discussion, please use the Hugging Face community forum (instruct, base)
For issues with LLM Foundry, or any of the underlying training libraries, please open an issue on the relevant GitHub repository.
Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.