A attempt to write finetuning script for codegen
vishalsingha opened this issue · comments
Hii,
I am interested in writing finetuning script for this model.
Can anyone tell while training the model in what format can i provide output to the model.
As the data is of form
{code:"def hello(name): return f"Hello {name}", nl : "This function takes name as input and a massage saying hello to the person in format hello name."}
We can tokenize input (either code or text ) by the making tokenizer and passing text into it.
But what format should I use for training and how to compare loss of original output and predicted output.
Thanks
Hello,
Can anyone give me the small sample (only 4 to 5 examples ) of BIGPYTHON dataset used for training the Codegen-nl mono just to get idea of training set.
@vishalsingha wondering whether you have made any progress in this regard?
@vishalsingha do you plan to share it?
Sorry it was done as part of internship so according to company policies I can't share it.
@vishalsingha got it. Would be awesome if you share any insight how you approached developing the fine tuning script. Also any reference to open source resources would be much appreciated.