FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | Read Paper on Bytez

Devs

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration

3 weeks ago

·

arXiv