SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices

Devs

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices | Read Paper on Bytez