Anderson University

CPSC 2550 Large Language Models

!INCLUDE "head.md"

Course Catalog Description

This course explores practical applications of Large Language Models (LLMs) such as GPT-4, PaLM, and LLaMA. Students will learn about the architecture and use of LLMs in tasks such as text generation, translation, and sentiment analysis. The course also covers the societal implications of LLMs, including bias and ethical considerations. Through hands-on projects, discussions, and presentations, students will gain practical experience with LLMs and explore their use in modern software systems. This course is intended for both non-majors and majors within the department. PREREQUISITE: Math proficiency.

Resources

There is no required textbook for this course, however the following material are made available for reference in this course:

Deep Learning by Yoshua Bengio, Ian Goodfellow, and Aaron Courville.
Python for Everybody by Charles Severance
Neural Machine Translation by Jointly Learning to Align and Translate (2014) by Bahdanau, Cho, and Bengio, https://arxiv.org/abs/1409.0473
Attention Is All You Need (2017) by Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin, https://arxiv.org/abs/1706.03762
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) by Devlin, Chang, Lee, and Toutanova, https://arxiv.org/abs/1810.04805
Improving Language Understanding by Generative Pre-Training (2018) by Radford and Narasimhan, https://gwern.net/doc/www/s3-us-west-2.amazonaws.com/d73fdc5ffa8627bce44dcda2fc012da638ffb158.pdf
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2019), by Lewis, Liu, Goyal, Ghazvininejad, Mohamed, Levy, Stoyanov, and Zettlemoyer, https://arxiv.org/abs/1910.13461.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022), by Dao, Fu, Ermon, Rudra, and Ré, https://arxiv.org/abs/2205.14135.
Cramming: Training a Language Model on a Single GPU in One Day (2022) by Geiping and Goldstein, https://arxiv.org/abs/2212.14034.
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning (2022) by Lialin, Deshpande, and Rumshisky, https://arxiv.org/abs/2303.15647.
Training Compute-Optimal Large Language Models (2022) by Hoffmann, Borgeaud, Mensch, Buchatskaya, Cai, Rutherford, de Las Casas, Hendricks, Welbl, Clark, Hennigan, Noland, Millican, van den Driessche, Damoc, Guy, Osindero, Simonyan, Elsen, Rae, Vinyals, and Sifre, https://arxiv.org/abs/2203.15556.
Training Language Models to Follow Instructions with Human Feedback (2022) by Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, Agarwal, Slama, Ray, Schulman, Hilton, Kelton, Miller, Simens, Askell, Welinder, Christiano, Leike, and Lowe, https://arxiv.org/abs/2203.02155.
Fine-Tuning Language Models from Human Preferences (2020) by Ziegler, Stiennon, Wu, Brown, Radford, Amodei, Christiano, Irving (https://arxiv.org/abs/1909.08593)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu https://arxiv.org/abs/1910.10683
Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei (https://arxiv.org/abs/2210.11416)

Course objectives

Upon completion of this course, students will be able to:

Interact with large language models from a Python environment
Describe high-level operation of large language models
Explain how large language models fit into the history of NLP and ML
Apply large language models to creative or analytical tasks

Course Overview

Please refer to Canvas for a list of assignments and detailed grading breakdown.

Projects

Significant software projects will be completed and assessed. Students will also complete presentations over the course of the semester related to their project progress.

Quizzes

Regular quizzes will be given covering the lecture material. Students will have two attempts on each quiz, one at the start of class and one at the end. Only the highest score will be kept. The lowest three quiz scores from the semester will not count toward the final grade.

Exams

All exams are cumulative with a focus on more recent material. Exams are typically a mix of multiple choice and essay questions.

!INCLUDE "tail.md"

jncraton / large-language-models-cpsc2550