b
Discover
Models
Search
About
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
2 weeks ago
·
NeurIPS