Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours

This research introduces a microsaccade-inspired probing technique using positional encoding perturbations to detect LLM misbehaviors without fine-tuning, of...

Level: advanced

By Unknown

Category: research