heretic-llm · PyPI

Explore Heretic-LLM, a Python tool implementing directional ablation to decensor LLMs by isolating refusal directions through first-token residual analysis. ...

Level: advanced

By Unknown

Category: discussion