Explore how mechanistic interpretability and Layer-Patching reveal the Knobe effect in finetuned LLMs, offering a pathway to eliminate social biases without ...
Level: advanced
By Unknown
Category: discussion