Support for custom model architecture

Question

Support for custom model architecture

itsnamgyu opened this issue 6 months ago · comments

I'm building a custom architecture involving multiple existing architectures as sub-components (Pythia, RoBERTa, T5, etc).

Does this library support custom architectures? If not, could someone give me some pointers on how to approach it? (e.g., use a different library, re-build the architecture using provided model components)

I'm planning to run pre-training from scratch up to 7B params. I'm mainly interested in using this library for its FlashAttention support and ease of multi-node training.

Quentin Anthony · Answer 1 · Mon Jan 15 2024 13:56:13 GMT+0800 (China Standard Time)

Hey there! Yes I think this is doable, but would take some effort to add the new architectures given that we only have GPT architectures supported here right now.

In terms of approaching things, since we're a megatron-based framework and many have added these architectures to other megatron-based frameworks, I'd recommend porting those implementations under our https://github.com/EleutherAI/gpt-neox/tree/main/megatron/model

There was a gpt-neox t5 effort at https://github.com/EleutherAI/gpt-neox/tree/t5-shared-params that you could start off from for t5 for example. T5 is now also in the upstream Megatron (https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/t5_model.py)

I would be happy to discuss this with you along the way and help on the effort if you go for it!

Namgyu Ho · Answer 2 · Mon Jan 15 2024 14:06:44 GMT+0800 (China Standard Time)

Thanks I'll check them out!

Brian Vaughan · Answer 3 · Thu Jan 25 2024 02:50:09 GMT+0800 (China Standard Time)

@itsnamgyu This might be helpful... here's an example where I use lm eval in another unrelated repo with custom models:
foundation-model-stack/foundation-model-stack#154

Namgyu Ho · Answer 4 · Sat Jan 27 2024 15:24:43 GMT+0800 (China Standard Time)

@nairbv Thanks a lot!

JDRanpariya · Answer 5 · Fri Feb 16 2024 18:06:31 GMT+0800 (China Standard Time)

Hey this sounds interesting, I'm planning to recreate model that's written in Pytorch with this library. Given it's custom architecture, what are things I need to consider and need to plan so that I can take benefit of Gpt-NeoX for distributed training. Any pointers or guidance would help.

I looked up the T5 model implementation as well on T5-shared-params branch I would like to know if it's only required to create a model file similar to gpt2_model.py in models directory or do I need to make changes with Megatron as well. It would be helpful if you can provide me with in idea of what changes are required to incorporate a custom model architecture.

Namgyu Ho · Answer 6 · Fri Feb 16 2024 19:21:14 GMT+0800 (China Standard Time)

@JDRanpariya I've actually decided to use the HuggingFace implementation of GPTNeoX with deepspeed and FlashAttention2 for now. I'm not working with T5 or RoBERTa at the moment.

jd-inferq · Answer 7 · Fri Feb 16 2024 19:29:23 GMT+0800 (China Standard Time)

Okay, Thanks!

…

On Fri, Feb 16, 2024 at 4:51 PM Namgyu Ho ***@***.***> wrote: @JDRanpariya <https://github.com/JDRanpariya> I've actually decided to use the HuggingFace implementation of GPTNeoX with deepspeed and FlashAttention2 for now. I'm not working with T5 or RoBERTa at the moment. — Reply to this email directly, view it on GitHub <#1117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BGFXCFVXRPKJCN3BHMJDBD3YT46LPAVCNFSM6AAAAABBWAHIWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYGIYTCMJYGQ> . You are receiving this because you commented.Message ID: ***@***.***>

Stella Biderman · Answer 8 · Tue Feb 20 2024 20:37:56 GMT+0800 (China Standard Time)

OP has decided to pursue a different approach than mod this library.

jd-inferq · Answer 9 · Tue Feb 20 2024 21:38:46 GMT+0800 (China Standard Time)

Yep, got it! I guess people wanting to do it would do it anyhow but I think this issue is good starting point. Is it possible to move it to discussions? might help people who want to do similar in future.