paperswithcode / galai

Model API for GALACTICA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fine-tuning specific areas

peng06051126 opened this issue · comments

First of all, thank you for your great contribution. I would like to fine-tune galactica in the direction of generating articles from topics. Can you provide training data samples, or do you have any suggestions?

The Galactica models were pretrained on large amount of papers (see our paper for more details):

image

so you should be able to generate articles out-of-the-box, but it depends on your use case.

Thank you for your reply. May I ask how the model performs on non-English data? Has there been any relevant test? And what proportion does non-English data take in the pre-training data set, such as Chinese data, etc.

By design the models are not multi-lingual and most of the natural language documents in the pretraining corpus are written in English. See more in Introduction to GALACTICA Models notebook (look for "multi-lingual").