b
Discover
Models
Search
About
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
6 months ago
·
arXiv