What H/W do you need to to fine tune Codegen?

I would like to fine tune the Codegen model.

What H/W would you need to fine tune a Codegen model?

What are the GPU reuirements?

Not a comprehensive answer, but I’ll share my experience.

I fine tuned the 350M model on a single A100 with 40Gb of RAM, with batch size 10 and an input length of 512 tokens

Used 80-90% of the RAM

@alecsharpie thanks for the sharing, I would like to do the same on a new programmatic language, but I have difficulties to use jaxformer implementation, if you have some examples to share it will be welcome! Did you use deepspeed library?

@alecsharpie thanks for sharing.

Wondering anyone attempted to fine-tune the 16B model and what kind of resources was employed?

@alecsharpie were you able to generate any proper code by giving plain english prompt ? if yes how are you doing that ? I am running the code on kaggle but it seems it's not doing anything at all

very inconsistently with the 350M model, even code based on code prompts isn't consistent for me at this number of parameters

@Extremys I used huggingface