This site is under active construction, and contents are subject to
change.
The Spring 2025 offering of the course is
archived here.
Content
What is this course about?
Language models serve as the cornerstone of modern natural language
processing (NLP) applications and open up a new paradigm of having a
single general purpose system address a range of downstream tasks. As
the field of artificial intelligence (AI), machine learning (ML), and
NLP continues to grow, possessing a deep understanding of language
models becomes essential for scientists and engineers alike. This course
is designed to provide students with a comprehensive understanding of
language models by walking them through the entire process of developing
their own. Drawing inspiration from operating systems courses that
create an entire operating system from scratch, we will lead students
through every aspect of language model creation, including data
collection and cleaning for pre-training, transformer model
construction, model training, and evaluation before deployment.
Prerequisites
-
Proficiency in Python
The majority of class assignments will be in Python. Unlike most
other AI classes, students will be given minimal scaffolding. The
amount of code you will write will be at least an order of magnitude
greater than for other classes. Therefore, being proficient in
Python and software engineering is paramount.
-
Experience with deep learning and systems optimization
A significant part of the course will involve making neural language
models run quickly and efficiently on GPUs across multiple machines.
We expect students to be able to have a strong familiarity with
PyTorch and know basic systems concepts like the memory hierarchy.
-
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable understanding matrix/vector notation and
operations.
-
Basic Probability and Statistics (e.g. CS 109 or equivalent)
You should know the basics of probabilities, Gaussian distributions,
mean, standard deviation, etc.
-
Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N)
You should be comfortable with the basics of machine learning and
deep learning.
Note that this is a 5-unit class. This is a very implementation-heavy
class, so please allocate enough time for it.
Coursework
Assignments
-
Assignment 1: Basics (version from 2025)
-
Implement all of the components (tokenizer, model architecture,
optimizer) necessary to train a standard Transformer language
model.
- Train a minimal language model.
-
Assignment 2: Systems (version from 2025)
-
Profile and benchmark the model and layers from Assignment 1 using
advanced tools, optimize Attention with your own Triton
implementation of FlashAttention2.
-
Build a memory-efficient, distributed version of the Assignment 1
model training code.
-
Assignment 3: Scaling (version from 2025)
-
Understand the function of each component of the Transformer.
-
Query a training API to fit a scaling law to project model
scaling.
-
Assignment 4: Data (version from 2025)
-
Convert raw Common Crawl dumps into usable pretraining data.
-
Perform filtering and deduplication to improve model performance.
-
Assignment 5: Alignment and Reasoning RL (version from 2025)
-
Apply supervised finetuning and reinforcement learning to train
LMs to reason when solving math problems.
-
Optional Part 2
(version from 2025): implement and apply safety alignment methods such as DPO.
All (currently tentative) deadlines are listed in the
schedule.
Honor code
Like all other classes at Stanford, we take the student
Honor Code
seriously. Please respect the following policies:
-
Collaboration: Study groups are allowed, but students must
understand and complete their own assignments, and hand in one
assignment per student. If you worked in a group, please put the
names of the members of your study group at the top of your
assignment. Please ask if you have any questions about the
collaboration policy.
-
AI tools: Prompting LLMs such as ChatGPT is permitted for
low-level programming questions or high-level conceptual questions
about language models, but using it directly to solve the problem is
prohibited. We strongly encourage you to disable AI autocomplete
(e.g., Cursor Tab, GitHub CoPilot) in your IDE when completing
assignments (though non-AI autocomplete, e.g., autocompleting
function names is totally fine). We have found that AI autocomplete
makes it much harder to engage deeply with the content.
-
Existing code: Implementations for many of the things you
will implement exist online. The handouts we'll give will be
self-contained, so that you will not need to consult third-party
code for producing your own implementation. Thus, you should not
look at any existing code unless when otherwise specified in the
handouts.
Submitting coursework
-
All coursework are submitted via Gradescope by the deadline. Do not
submit your coursework via email.
-
If anything goes wrong, please ask a question in Slack or contact a
course assistant.
-
You can submit as many times as you'd like until the deadline: we
will only grade the last submission.
- Partial work is better than not submitting any work.
Late days
-
Each student has 6 late days to use. A late day extends the
deadline by 24 hours.
- You can use up to 3 late days per assignment.
Regrade requests
If you believe that the course staff made an objective error in
grading, you may submit a regrade request on Gradescope within 3 days
after the grades are released.