b

DiscoverModelsSearch
About
Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
1 week ago
·
NeurIPS