CSE 5539: Reinforcement Learning in the Agentic Age
Course Overview
This seminar explores the intersection of reinforcement learning and the emerging capabilities of large language models and agentic AI systems. We examine how RL techniques enable and enhance autonomous agents while investigating fundamental barriers to self-improvement: exploration, discrimination, and generalization.
Topics include deep reinforcement learning foundations, policy gradient methods, model-based control, and their applications to language model training, robotics, and autonomous agents. Students will read and present recent research papers and complete a semester-long research project.
Course Format
Term: Spring 2026 Instructor: Andrew Perrault Institution: The Ohio State University, Department of Computer Science and Engineering
Course materials and assignments are available on OSU Canvas for enrolled students.
Prerequisites
- Experience with deep learning (neural network architectures, training procedures)
- Ability to read and critically analyze machine learning research papers
- Mathematical background in probability, linear algebra, and optimization
Assignments
Paper Presentation
Students present and lead discussion on a recent research paper related to reinforcement learning and language models or robotics. Presentations focus on critically evaluating the paper’s research questions, methodology, and contributions.
Research Project
The primary course work is a semester-long research project exploring reinforcement learning. Projects are conducted in teams of up to 3 students and culminate in a workshop or conference-style paper.
Project components:
- Proposal: Research question, motivation, related work, experimental plan, and resource requirements
- Final Report: Complete research paper in LaTeX documenting methodology, experiments, results, and conclusions
- Presentation: End-of-semester presentation of findings
- Code: Reproducible implementation in Python/PyTorch
Projects typically involve training RL agents or fine-tuning language models with reinforcement learning, requiring GPU compute resources for experiments.
Course Schedule
1/14: Introduction & Animal Learning
1/28: MDPs & Solving Known MDPs
2/4: Known MDPs & Unknown MDPs
2/11: Unknown MDPs & Policy Gradient
2/18: Policy Gradient (cont.) & MLLM Limits
- LLMs Get Lost in Multi-Turn Conversation
- Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
- Castle-in-the-Air: Evaluating MLLM Visual Abilities on Human Cognitive Benchmarks
2/25: Foundations & Agents
- Asymptotic Analysis of Shallow and Deep Forgetting in Replay with Neural Collapse
- RL’s Razor: Why Online Reinforcement Learning Forgets Less
- Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
- AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
3/4: Reasoning #1
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
- Reasoning without Training: Your Base Model is Smarter Than You Think
3/11: Reasoning #2 & Robotics #1
- Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
- Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
- π*₀.₆: A VLA That Learns From Experience
- Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
3/18: Spring Break
No class.
3/25: Robotics #2 & Misc
- R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
- Safe Exploration via Policy Priors
- DiffusionNFT: Online Diffusion Reinforcement with Forward Process
- Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
