MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents

MetaSAE introduces a joint training objective with a decomposability penalty to produce more atomic sparse autoencoder latents, significantly improving model...

Level: advanced

By Matthew Levinson

Category: research