mosaicml / llm-foundry

LLM training code for Databricks foundation models

Home Page:https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Triton attention patch from Mistral

germanjke opened this issue · comments

Hi!

I can use it for LLaMa like

# Model
model:
  name: hf_causal_lm
  model_type: llama
  attention_patch_type: triton
  pretrained_model_name_or_path: /Llama-2-7B
  pretrained: true

Can I use it to Mistral or it's not supported yet?

Its not supported, I recommend using flash attention 2. Please see https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#flashattention.