CSE 5539: Reinforcement Learning in the Agentic Age

Course Overview

This seminar explores the intersection of reinforcement learning and the emerging capabilities of large language models and agentic AI systems. We examine how RL techniques enable and enhance autonomous agents while investigating fundamental barriers to self-improvement: exploration, discrimination, and generalization.

Topics include deep reinforcement learning foundations, policy gradient methods, model-based control, and their applications to language model training, robotics, and autonomous agents. Students will read and present recent research papers and complete a semester-long research project.

Course Format

Term: Spring 2026 Instructor: Andrew Perrault Institution: The Ohio State University, Department of Computer Science and Engineering

Course materials and assignments are available on OSU Canvas for enrolled students.

Prerequisites

Experience with deep learning (neural network architectures, training procedures)
Ability to read and critically analyze machine learning research papers
Mathematical background in probability, linear algebra, and optimization

Assignments

Paper Presentation

Students present and lead discussion on a recent research paper related to reinforcement learning and language models or robotics. Presentations focus on critically evaluating the paper’s research questions, methodology, and contributions.

Research Project

The primary course work is a semester-long research project exploring reinforcement learning. Projects are conducted in teams of up to 3 students and culminate in a workshop or conference-style paper.

Project components:

Proposal: Research question, motivation, related work, experimental plan, and resource requirements
Final Report: Complete research paper in LaTeX documenting methodology, experiments, results, and conclusions
Presentation: End-of-semester presentation of findings
Code: Reproducible implementation in Python/PyTorch

Projects typically involve training RL agents or fine-tuning language models with reinforcement learning, requiring GPU compute resources for experiments.

Course Schedule

2/18: Policy Gradient (cont.) & MLLM Limits

2/25: Foundations & Agents

3/4: Reasoning #1

3/11: Reasoning #2 & Robotics #1

3/18: Spring Break

No class.

Andrew Perrault

CSE 5539: Reinforcement Learning in the Agentic Age

Course Overview

Course Format

Prerequisites

Assignments

Paper Presentation

Research Project

Course Schedule

1/14: Introduction & Animal Learning

1/28: MDPs & Solving Known MDPs

2/4: Known MDPs & Unknown MDPs

2/11: Unknown MDPs & Policy Gradient

2/18: Policy Gradient (cont.) & MLLM Limits

2/25: Foundations & Agents

3/4: Reasoning #1

3/11: Reasoning #2 & Robotics #1

3/18: Spring Break

3/25: Robotics #2 & Misc

4/1: Project Presentations #1

4/8: Project Presentations #2

4/15: Project Presentations #3

4/22: Project Presentations #4