HiFC: High-efficiency Flash-based KV Cache Swapping for Scaling LLM Inference

Devs

HiFC: High-efficiency Flash-based KV Cache Swapping for Scaling LLM Inference | Read Paper on Bytez