Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling | Read Paper on Bytez