This site is under active construction, and contents are subject to change.
The Spring 2025 offering of the course is archived here.
Please fill out this form to apply for enrollment in CS336.
Applications are due March 15 11:59pm and we will notify you of our decision by March 22 11:59pm.
Due to the compute requirements for this class, we unfortunately have to limit enrollment.
Please submit the form using your Stanford email address.

Course Staff

Logistics

Content

What is this course about?

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.

Prerequisites

Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.


Coursework

Assignments

All (currently tentative) deadlines are listed in the schedule.

Honor code

Like all other classes at Stanford, we take the student Honor Code seriously. Please respect the following policies:

Submitting coursework

Late days

Regrade requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 3 days after the grades are released.

Sponsor

We would like to thank Together AI for sponsoring the compute for this class.


Schedule

# Date Description Course Materials Deadlines
1 Mon March 30 Overview, tokenization (Percy) Assignment 1 out
2 Wed April 1 PyTorch, resource accounting (Percy)
3 Mon April 6 Architectures, hyperparameters (Percy)
4 Wed April 8 Mixture of experts (Tatsu)
5 Mon April 13 GPUs (Tatsu)
6 Wed April 15 Kernels, Triton (Tatsu) Assignment 1 due
Assignment 2 out
7 Mon April 20 Parallelism (Tatsu)
8 Wed April 22 Parallelism (Percy)
9 Mon April 27 Scaling laws (Tatsu)
10 Wed April 29 Scaling laws (Tatsu) Assignment 2 due
Assignment 3 out
11 Mon May 4 Data (Percy)
12 Wed May 6 Data (Percy) Assignment 3 due
Assignment 4 out
13 Mon May 11 Data (Percy)
14 Wed May 13 Data (Percy)
15 Mon May 18 Alignment (Tatsu)
16 Wed May 20 Alignment (Tatsu) Assignment 4 due
Assignment 5 out
Mon May 25 No class (Memorial Day)
17 Wed May 27 Alignment, evals (Tatsu)
18 Mon June 1 Test-time compute, RL
19 Wed June 3 Guest lecture by TBD Assignment 5 due