Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
This research introduces a hierarchical framework for detecting deep conceptual contamination in synthetic training data, moving beyond simple token overlap ...