Polos: Multimodal Metric Learning from Human Feedback for Image Captioning | Read Paper on Bytez