b

DiscoverModelsSearch
About
μP2\boldsymbol{\mu}\mathbf{P^2}: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling
4 weeks ago
·
NeurIPS