What H/W do you need to to fine tune Codegen?

Question

What H/W do you need to to fine tune Codegen?

smith-co opened this issue 2 years ago · comments

smith-coding commented 2 years ago

I would like to fine tune the Codegen model.

What H/W would you need to fine tune a Codegen model?

What are the GPU reuirements?

Alec Sharp · Answer 1 · Sun Nov 27 2022 18:44:25 GMT+0800 (China Standard Time)

Not a comprehensive answer, but I’ll share my experience.

I fine tuned the 350M model on a single A100 with 40Gb of RAM, with batch size 10 and an input length of 512 tokens

Used 80-90% of the RAM

Extremys · Answer 2 · Sun Nov 27 2022 18:56:52 GMT+0800 (China Standard Time)

@alecsharpie thanks for the sharing, I would like to do the same on a new programmatic language, but I have difficulties to use jaxformer implementation, if you have some examples to share it will be welcome! Did you use deepspeed library?

nashid · Answer 3 · Mon Nov 28 2022 11:25:06 GMT+0800 (China Standard Time)

@alecsharpie thanks for sharing.

Wondering anyone attempted to fine-tune the 16B model and what kind of resources was employed?

SubhajitC-Hexaware · Answer 4 · Tue Nov 29 2022 15:28:45 GMT+0800 (China Standard Time)

@alecsharpie were you able to generate any proper code by giving plain english prompt ? if yes how are you doing that ? I am running the code on kaggle but it seems it's not doing anything at all

Alec Sharp · Answer 5 · Wed Jan 11 2023 19:03:52 GMT+0800 (China Standard Time)

@SubhajitC-Hexaware
very inconsistently with the 350M model, even code based on code prompts isn't consistent for me at this number of parameters

Alec Sharp · Answer 6 · Wed Jan 11 2023 19:05:43 GMT+0800 (China Standard Time)

@Extremys I used huggingface