Energy-Driven Steering: Reducing False Refusals in Large Language Models

This research introduces EDS, a novel real-time method leveraging Energy-Based Models to steer LLM hidden states, significantly reducing false refusals while...

Level: advanced

By Unknown

Category: research