SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

Explore SAFEx, a novel framework for decomposing Mixture of Experts models to identify and mitigate safety-critical vulnerabilities using stability-based sel...

Level: advanced

By Unknown

Category: discussion