LLM Safety Alignment is Divergence Estimation in Disguise

This research reframes LLM safety alignment as a divergence estimation problem, introducing the KLDO framework to optimize separation between safe and harmfu...

Level: expert

By Rajdeep Haldar, Ziyi Wang, Qifan Song, Guang Lin, Yue Xing

Category: discussion