Research
Current Research: Using Alignment to Improve Reward Design in RL
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
In this work, we develop a reward alignment metric, the Trajectory Alignment Coefficient, to evaluate how well a reward function, discount factor pair encodes the preferences of a domain expert. The Trajectory Alignment Coefficient quantifies the similarity between a human stakeholder’s ranking of trajectory distributions and those induced by a given reward function, discount factor pair. We prove that this metric is invariant to potential based reward shaping and positive linear transformations. Additionally, we demostrate in a user study of RL practioneries, that access to our reward metric during reward selection can make the process easier and lead to more performant reward functions.
Published at RLC 2025.
Past Research: Improving Preference-based RL Algorithms
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
In this work, we propose a simple technique that utilizes reward-free, low-quality data to boost the performance of off-the-shelf preference-based RL algorithms. Most importantly, we validate our approach with a human-user study.
Published at ICLR 2025.
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
In this paper, we tackle the problem of learning reward models from human preferences in extremely noisy environments. We found that by leveraging principles of dynamic sparse training, reward models can effectively learn to focus on task-relevant features.
Published at AAMAS 2025 (Extended Abstract).