MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents
MetaSAE introduces a joint training objective with a decomposability penalty to produce more atomic sparse autoencoder latents, significantly improving model...