bytez
Search
Feed
Models
Agent
Devs
Model API
docs
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion | Read Paper on Bytez