Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints

This research introduces Coupled Weight and Activation Constraints (CWAC), a novel approach to prevent safety drift in Large Language Models during fine-tuni...

Level: advanced

By Songping Peng

Category: discussion