b
Discover
Models
Search
About
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
2 weeks ago
·
NeurIPS