Better & Faster Large Language Models via Multi-token Prediction
3 weeks ago·Arxiv