Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers

Explore Hydra Ensembles, a novel approach to structured attention head pruning that delivers calibrated uncertainty and near-single-model inference speeds wi...

Level: advanced

By Firas Gabetni, Giuseppe Curci, Andrea Pilzer, Subhankar Roy, Elisa Ricci, Gianni Franchi

Category: research