ParallelKittens: Simple and Fast Multi-GPU AI Kernels

Explore ParallelKittens, a high-performance framework that leverages custom CUDA kernels to saturate GPU bandwidth and minimize data transfer overhead in mul...

Level: advanced

By Stuart Sul, Simran Arora, Benjamin Spector, Chris Ré

Category: research