This research reframes LLM safety alignment as a divergence estimation problem, introducing the KLDO framework to optimize separation between safe and harmfu...
Level: expert
By Rajdeep Haldar, Ziyi Wang, Qifan Song, Guang Lin, Yue Xing
Category: discussion