GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

This research explores using sparse autoencoders to decode social biases within GPT-style models trained on literary data, offering a scalable path to enhanc...

Level: advanced

By Unknown

Category: research