MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Devs

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Read Paper on Bytez