amaliujia / cs324_p2

Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)

Home Page:https://stanford-cs324.github.io/winter2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)

This repository contains the code for project 2 on training language models in the Winter 2022 iteration of CS324: Understanding and Developing Large Language Models at Stanford. This code was developed by Sang Michael Xie and Percy Liang.

  • In the first part of the project, we use fine-tuning/continued-pretraining to instill length controllability into GPT-2 small, where we can control the word length of the model's generated response by prepending metadata about length. The data, preprocessing code, and training code are provided, but some hyperparameters need to be tuned.
  • In the second part of the project, we aim to instill a new capability of choice into a language model. The project code is built on top of HuggingFace, and the data for the first part is based on OpenWebText. By default, we provide scripts for running the project on CodaLab, a platform for reproducibile research, but the scripts and code can be adapted to run locally or on other platforms as well.

Useful links:

Setup and quickstart for CodaLab:

About

Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)

https://stanford-cs324.github.io/winter2022

License:MIT License


Languages

Language:Python 79.6%Language:Shell 16.8%Language:Dockerfile 3.6%