Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
This research introduces Coupled Weight and Activation Constraints (CWAC), a novel approach to prevent safety drift in Large Language Models during fine-tuni...