Demystifying Synthetic Data in LLM Pre-training

This research explores the nuanced role of synthetic data in LLM pre-training, revealing that mixing 30% synthetic data with natural data can accelerate trai...

Level: advanced

By Unknown

Category: research