b
Discover
Models
Search
About
Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
1 week ago
·
NeurIPS