A Statistical Framework for Alignment with Biased AI Feedback-清华大学求真书院

研讨班

首页 > 书院学术 > 至美数学 > 研讨班

A Statistical Framework for Alignment with Biased AI Feedback

来源： 05-13

时间：Thur., 10:00-11:00 am, May 14, 2026

地点：C548, Shuangqing Complex Building A

组织者：Yunan Wu

主讲人：Zhanrui Cai

Statistical Seminar

Organizer：Yunan Wu 吴宇楠 (YMSC)

Speaker：

Zhanrui Cai 蔡占锐

香港大学经管学院

Time：

Thur., 10:00-11:00 am, May 14, 2026

Venue：

C548, Shuangqing Complex Building A

Title:

A Statistical Framework for Alignment with Biased AI Feedback

Abstract:

Modern alignment pipelines are increasingly replacing expensive human preference labels with evaluations from large language models (LLM-as-Judge). However, AI labels can be systematically biased compared to high-quality human feedback datasets. In this paper, we develop two debiased alignment methods within a general framework that accommodates heterogeneous prompt-response distributions and external human feedback sources. Debiased Direct Preference Optimization (DDPO) augments standard DPO with a residual-based correction and density-ratio reweighting to mitigate systematic bias, while retaining DPO's computational efficiency. Debiased Identity Preference Optimization (DIPO) directly estimates human preference probabilities without imposing a parametric reward model. We provide theoretical guarantees for both methods: DDPO offers a practical and computationally efficient solution for large-scale alignment, whereas DIPO serves as a robust, statistically optimal alternative that attains the semiparametric efficiency bound. Empirical studies on sentiment generation, summarization, and single-turn dialogue demonstrate that the proposed methods substantially improve alignment efficiency and recover performance close to that of an oracle trained on fully human-labeled data.

返回顶部

Instruction for choosing courses in the direction Algebra and Number
Please download the file for more informatio
View more
Biased Random Walk on Dynamical Percolation
Abstract:We consider a biased random walk on dynamical percolation and discuss the existence and the properties of the linear speed as a function of the bias. In particular, we establish a simple criterion to decide whether the speed is increasing or decreasing for large bias. This talk is based on joint work with Sebastian Andres, Nina Gantert, and Perla Sousi
View more

书院学术

A Statistical Framework for Alignment with Biased AI Feedback

Instruction for choosing courses in the direction Algebra and Number

Biased Random Walk on Dynamical Percolation

友情链接 HYPERLINK：