VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Explore VITA-Audio, a novel architecture enabling fast, interleaved cross-modal token generation for efficient large speech-language models with real-time ca...