salesforce / CodeGen

CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What H/W do you need to to fine tune Codegen?

smith-co opened this issue · comments

I would like to fine tune the Codegen model.

What H/W would you need to fine tune a Codegen model?

What are the GPU reuirements?

Not a comprehensive answer, but I’ll share my experience.

I fine tuned the 350M model on a single A100 with 40Gb of RAM, with batch size 10 and an input length of 512 tokens

Used 80-90% of the RAM

@alecsharpie thanks for the sharing, I would like to do the same on a new programmatic language, but I have difficulties to use jaxformer implementation, if you have some examples to share it will be welcome! Did you use deepspeed library?

@alecsharpie thanks for sharing.

Wondering anyone attempted to fine-tune the 16B model and what kind of resources was employed?

@alecsharpie were you able to generate any proper code by giving plain english prompt ? if yes how are you doing that ? I am running the code on kaggle but it seems it's not doing anything at all

@SubhajitC-Hexaware
very inconsistently with the 350M model, even code based on code prompts isn't consistent for me at this number of parameters

@Extremys I used huggingface