bytez
Search
Feed
Models
Agent
Devs
API Dashboard
docs
FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
3 weeks ago
·
arXiv