ise-uiuc / magicoder

Magicoder: Source Code Is All You Need

Home Page:https://arxiv.org/abs/2312.02120

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

used Dilated attenton instead of Vanilla Attention in Llama model and fine-tuen the model ,

younesselbrag opened this issue · comments

i want to ask if I can replace the Dilated attention with Attention used in the based model and do the fine-tuning, the idea behind this is to reduce the complexity of Attention and increase the Windows context, does DeepSeek use Llama 2 as a based model the same arch which means, I can load the Checkpoint of layers of the model such Normlayer and feedforward or I need to re-factor the LLM model from Scratch !!
or there's any method to adapt weight or Shared Weight