A Statistical Framework for Alignment with Biased AI Feedback

Time：Thur., 10:00-11:00 am, May 14, 2026

Venue：C548, Shuangqing Complex Building A

Organizer：Yunan Wu

Speaker：Zhanrui Cai

Statistical Seminar

Organizer：Yunan Wu 吴宇楠 (YMSC)

Speaker：

Zhanrui Cai 蔡占锐

香港大学经管学院

Time：

Thur., 10:00-11:00 am, May 14, 2026

Venue：

C548, Shuangqing Complex Building A

Title:

A Statistical Framework for Alignment with Biased AI Feedback

Abstract:

Modern alignment pipelines are increasingly replacing expensive human preference labels with evaluations from large language models (LLM-as-Judge). However, AI labels can be systematically biased compared to high-quality human feedback datasets. In this paper, we develop two debiased alignment methods within a general framework that accommodates heterogeneous prompt-response distributions and external human feedback sources. Debiased Direct Preference Optimization (DDPO) augments standard DPO with a residual-based correction and density-ratio reweighting to mitigate systematic bias, while retaining DPO's computational efficiency. Debiased Identity Preference Optimization (DIPO) directly estimates human preference probabilities without imposing a parametric reward model. We provide theoretical guarantees for both methods: DDPO offers a practical and computationally efficient solution for large-scale alignment, whereas DIPO serves as a robust, statistically optimal alternative that attains the semiparametric efficiency bound. Empirical studies on sentiment generation, summarization, and single-turn dialogue demonstrate that the proposed methods substantially improve alignment efficiency and recover performance close to that of an oracle trained on fully human-labeled data.

DATEMay 13, 2026

Related News

0
Biased Random Walk on Dynamical Percolation
Abstract:We consider a biased random walk on dynamical percolation and discuss the existence and the properties of the linear speed as a function of the bias. In particular, we establish a simple criterion to decide whether the speed is increasing or decreasing for large bias. This talk is based on joint work with Sebastian Andres, Nina Gantert, and Perla Sousi
1
Assessing New Predictors with Biased Data Augmented with Summary Statistics
Statistical SeminarOrganizer：吴宇楠Speaker：张洪教授中国科学技术大学管理学院统计与金融系Time：Fri., 16:00-17:00, June 5, 2026Venue：C654, Shuangqing Complex Building ATitle：Assessing New Predictors with Biased Data Augmented with Summary StatisticsAbstract:In many real-world applications, evaluating the added value of new risk predictors is hampered by the lack of individual-level data in the...