Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
2019·Arxiv