Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation | Read Paper on Bytez