CSE 5539: Reinforcement Learning in the Agentic Age

Course Overview

This seminar explores the intersection of reinforcement learning and the emerging capabilities of large language models and agentic AI systems. We examine how RL techniques enable and enhance autonomous agents while investigating fundamental barriers to self-improvement: exploration, discrimination, and generalization.

Topics include deep reinforcement learning foundations, policy gradient methods, model-based control, and their applications to language model training, robotics, and autonomous agents. Students will read and present recent research papers and complete a semester-long research project.

Course Format

Term: Spring 2026 Instructor: Andrew Perrault Institution: The Ohio State University, Department of Computer Science and Engineering

Course materials and assignments are available on OSU Canvas for enrolled students.

Prerequisites

  • Experience with deep learning (neural network architectures, training procedures)
  • Ability to read and critically analyze machine learning research papers
  • Mathematical background in probability, linear algebra, and optimization

Assignments

Paper Presentation

Students present and lead discussion on a recent research paper related to reinforcement learning and language models or robotics. Presentations focus on critically evaluating the paper’s research questions, methodology, and contributions.

Research Project

The primary course work is a semester-long research project exploring reinforcement learning. Projects are conducted in teams of up to 3 students and culminate in a workshop or conference-style paper.

Project components:

  • Proposal: Research question, motivation, related work, experimental plan, and resource requirements
  • Final Report: Complete research paper in LaTeX documenting methodology, experiments, results, and conclusions
  • Presentation: End-of-semester presentation of findings
  • Code: Reproducible implementation in Python/PyTorch

Projects typically involve training RL agents or fine-tuning language models with reinforcement learning, requiring GPU compute resources for experiments.

Course Schedule

1/14: Introduction & Animal Learning

1/28: MDPs & Solving Known MDPs

2/4: Known MDPs & Unknown MDPs

2/11: Unknown MDPs & Policy Gradient

2/18: Policy Gradient (cont.) & MLLM Limits

2/25: Foundations & Agents

3/4: Reasoning #1

3/11: Reasoning #2 & Robotics #1

3/18: Spring Break

No class.

3/25: Robotics #2 & Misc

4/1: Project Presentations #1

4/8: Project Presentations #2

4/15: Project Presentations #3

4/22: Project Presentations #4