Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency | Read Paper on Bytez

Devs

Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency

3 weeks ago

·

arXiv