NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Devs

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention | Read Paper on Bytez