Research

Current Research: Using Alignment to Improve Reward Design in RL

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

In this work, we develop a reward alignment metric, the Trajectory Alignment Coefficient, to evaluate how well a reward function, discount factor pair encodes the preferences of a domain expert. The Trajectory Alignment Coefficient quantifies the similarity between a human stakeholder’s ranking of trajectory distributions and those induced by a given reward function, discount factor pair. We prove that this metric is invariant to potential based reward shaping and positive linear transformations. Additionally, we demostrate in a user study of RL practioneries, that access to our reward metric during reward selection can make the process easier and lead to more performant reward functions.

Published at RLC 2025.

Past Research: Improving Preference-based RL Algorithms

Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

In this work, we propose a simple technique that utilizes reward-free, low-quality data to boost the performance of off-the-shelf preference-based RL algorithms. Most importantly, we validate our approach with a human-user study.

Published at ICLR 2025.

In this paper, we tackle the problem of learning reward models from human preferences in extremely noisy environments. We found that by leveraging principles of dynamic sparse training, reward models can effectively learn to focus on task-relevant features.

Published at AAMAS 2025 (Extended Abstract).

Publications

You can find my articles on my Google Scholar profile.