bytez
Search

Feed
Models
Agent

Devs

API Dashboard
docs

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
3 weeks ago
·
arXiv