This site is under active construction, and contents are subject to change.
The Spring 2024 offering of the course is archived here.

Please fill out this form to apply for enrollment in CS336.
Applications are due March 16 11:59pm and we will notify you of our decision by March 23 11:59pm.
Due to the compute requirements for this class, we unfortunately have to limit enrollment.

The Spring 2024 offering of the course is archived here.

Course Staff

Percy Liang
Instructor

Logistics

Content

What is this course about?

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.

Prerequisites

Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.


Coursework

Assignments

All (currently tentative) deadlines are listed in the schedule.

Honor code

Like all other classes at Stanford, we take the student Honor Code seriously. Please respect the following policies:

Submitting coursework

Late days

Regrade requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 3 days after the grades are released.


Schedule

# Date Description Course Materials Events Deadlines
1 Tues April 1 Overview, tokenization (Percy) Assignment 1 out
2 Thurs April 3 Pytorch, resource accounting (Percy)
3 Tues April 8 Architectures, hyperparameters (Percy)
4 Thurs April 10 Mixture of experts (Tatsu)
5 Tues April 15 GPUs (Tatsu) Assignment 1 due
6 Thurs April 17 Kernels, Triton (Tatsu) Assignment 2 out
7 Tues April 22 Parallelism (Tatsu)
8 Thurs April 24 Parallelism (Percy)
9 Tues April 29 Scaling laws (Tatsu)
10 Thurs May 1 Scaling laws (Tatsu) Assignment 2 due
Assignment 3 out
11 Tues May 6 Data (Percy)
12 Thurs May 8 Data (Percy) Assignment 3 due
Sat May 10 Assignment 4 out
13 Tues May 13 Data (Percy)
14 Thurs May 15 Data (Percy)
15 Tues May 20 Alignment (Tatsu)
16 Thurs May 22 Alignment (Tatsu)
17 Tues May 27 Alignment, evals (Tatsu) Assignment 4 due
18 Thurs May 29 Test-time compute, RL Assignment 5 out
19 Tues June 3 Guest lecture by TBD
20 Thurs June 5 Guest lecture by TBD