The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models | Read Paper on Bytez