Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis | Read Paper on Bytez