Table 1: The subtask descriptions for the Ubuntu and Advising datasets of the DSTC7 noetic end-to-end response selection track.
Figure 1: Examples of the context, the candidate responses, and the correct response for the subtask 1 for the Ubuntu and Advising datasets, respectively.
Figure 2: Two kinds of neural network based methods for sentence pair classification.
for the context and the response, respectively. F is a one-layer feed-forward
Table 2: Statistics of the pretrained word embeddings. Rows 1-3 are from GloVe; Rows 4-5 are from fastText; Rows 6-7 are from word2vec.
Table 3: The official submission results from our proposed ESIM system on the hidden test sets for the DSTC7 noetic end-to-end response selection challenge. NA - not applicable. The official metric used for ranking teams, denoted Metric, is the average of MRR and Recall@10, as presented in the table.
Table 4: The official DSTC7 noetic end-to-end response selection track results cited from (Gunasekara et al., 2019a). Teams which submitted results for all subtasks are shown here. We are Team 3. The metric is the average of MRR and Recall@10.
Table 5: Comparisons of different models on two large-scale public benchmark datasets. All the results except ours are cited from the previous works (Zhang et al., 2018; Zhou et al., 2018).
Table 6: Ablation analysis of removing context composition (-CtxDec), removing emphasizing most recent context utterances (-Rev), incorporating external domain knowledge (+W2V), and model ensemble (Ensemble) on the development set for the DSTC7 Ubuntu dataset. For subtask 5, “+W2V” shows the results of concatenating the task specific word embeddings into the embedding combination.
Table 7: Same ablation analysis of removing context composition (-CtxDec), removing emphasizing most recent context utterances (-Rev), and model ensemble as in Table 6, but conducted on the development set for the DSTC7 Advising dataset. For subtask 5, “+W2V” shows the results of adding the task specific word embeddings into the embedding combination. Note that, the official submission results in Table 3 for subtask 5 of Advising is “Ensemble1” due to lack of enough time.
Table 8: Ablation analysis of using different word embeddings, compared with combining all five word embeddings followed by dimension reduction as in the submitted system, on the subtask 1 development set for the DSTC7 Ubuntu dataset.
Table 9: Ablation analysis of using the heuristic data augmentation or not, and comparing the positive:negative sample ratio 1:4 or 1:1, on the subtask 1 development set for the DSTC7 Ubuntu dataset.
Table 10: Our post-DSTC7 BERT results (the third group) on the hidden test sets for the DSTC7 response selection challenge Ubuntu data, compared to our submitted ESIM ensemble results (as in Table 3). Unless noted otherwise, the pretrained BERT model was fine-tuned with 2 epochs on the same training data as the ESIM model for the Ubuntu data set, i.e., with the heuristic data augmentation and 1:4 for positive:negative samples. Results are shown for using the initial learning rates (lr) from [1e-5, 2e-5, 3e-5, 5e-5]. The second group results, including Multiturn ESIM + ELMo (denoted MT-EE), OpenAI GPT, and BERT model results, are all cited from (Vig and Ramea, 2019). The fourth group of results are from the pretrained BERT model fine-tuned with 2 epochs, but on data without the heuristic data augmentation and with 1:1 for positive:negative samples. The fifth group of results is from the BERT model without pre-training, trained on the same training data as the third group. Note that we only present the best results here for BERT no-pre-train, which is from using lr2e-5, by comparing using lr from [1e-5, 2e-5, 3e-5, 5e-5]. For lr2e-5, we ran the experiment five times and present the best results together with the mean and standard deviation of results from these five runs.
Table 11: Our post-DSTC7 BERT results (the third group) on the hidden test sets for the DSTC7 response selection challenge Advising data, compared to our submitted ESIM ensemble results (as in Table 3). The BERT model was trained with 1 epoch. Results are shown for using the initial learning rates from [1e-5, 2e-5, 3e-5, 5e-5]. The second group of results, including Multi-turn ESIM + ELMo (MT-EE), OpenAI GPT, and BERT model results, are cited from (Vig and Ramea, 2019). Note that (Vig and Ramea, 2019) only reported results on the Advising2 test set. The fourth group of results is from incorporating suggested course information into the BERT model.
Bromley, J., Guyon, I., LeCun, Y., S¨ackinger, E., Shah, R., 1993. Signature ver-ification using a siamese time delay neural network, in: Advances in Neural Information Processing Systems 6, pp. 737–744.
Chen, Q., Ling, Z., Zhu, X., 2018a. Enhancing sentence embedding with generalized pooling, in: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pp. 1815–1826.
Dialog System Technology Challenges at AAAI 2019. URL: http://workshop.colips.org/dstc7/papers/07.pdf.
Chen, Q., Zhu, X., Ling, Z., Inkpen, D., Wei, S., 2018b. Neural natural language inference models enhanced with external knowledge, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, pp. 2406–2417.
Chen, Q., Zhu, X., Ling, Z., Wei, S., Jiang, H., Inkpen, D., 2017a. Enhanced LSTM for natural language inference, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pp. 1657– 1668. doi:10.18653/v1/P17-1152.
Chen, Q., Zhu, X., Ling, Z., Wei, S., Jiang, H., Inkpen, D., 2017b. Recurrent neural network-based sentence encoder with gated attention for natural language inference, in: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, RepEval@EMNLP 2017, pp. 36–40.
Devlin, J., Chang, M., Lee, K., Toutanova, K., 2019. BERT: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. URL: https://doi.org/10.18653/v1/n19-1423, doi:10.18653/v1/n19-1423.
Ganhotra, J., Patel, S.S., Fadnis, K., 2019. Knowledge-incorporating esim models for response selection in retrieval-based dialog systems, in: Proceedings of the DSTC7 Workshop with AAAI 2019.
Gunasekara, C., Kummerfeld, J.K., Polymenakos, L., Lasecki, W., 2019a. DSTC7 task 1: Noetic end-to-end response selection, in: Proceedings of the First Workshop on NLP for Conversational AI, pp. 60–67.
Kadlec, R., Schmid, M., Kleindienst, J., 2015. Improved deep learning baselines for ubuntu corpus dialogs. CoRR abs/1510.03753. arXiv:1510.03753.
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980. arXiv:1412.6980.
Kummerfeld, J.K., Gouravajhala, S.R., Peper, J., Athreya, V., Gunasekara, C., Ganhotra, J., Patel, S.S., Polymenakos, L., Lasecki, W.S., 2018. Analyzing assumptions in conversation disentanglement research through the lens of a new dataset and model. arXiv preprint arXiv:1810.11118 .
Lin, Z., Feng, M., dos Santos, C.N., Yu, M., Xiang, B., Zhou, B., Bengio, Y.,
Lowe, R., Pow, N., Serban, I., Pineau, J., 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems, in: Proceedings of the SIGDIAL 2015 Conference, pp. 285–294.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A., 2018. Advances in pre-training distributed word representations, in: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013. Distributed representations of words and phrases and their compositionality, in: 27th Annual Conference on Neural Information Processing Systems 2013., pp. 3111– 3119.
Mou, L., Men, R., Li, G., Xu, Y., Zhang, L., Yan, R., Jin, Z., 2016. Natural language inference by tree-based convolution and heuristic matching, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016.
Pennington, J., Socher, R., Manning, C.D., 2014. Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1532–1543.
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A.C., Pineau, J., 2016. Building end-to-end dialogue systems using generative hierarchical neural network models, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3784.
Tan, M., Xiang, B., Zhou, B., 2015. Lstm-based deep learning models for nonfactoid answer selection. CoRR abs/1511.04108. arXiv:1511.04108.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention is all you need, in: Annual Conference on Neural Information Processing Systems 2017, pp. 6000–6010.
Vig, J., Ramea, K., 2019. Comparison of transfer-learning approaches for re- sponse selectionin multi-turn conversations, in: Proceedings of the DSTC7 Workshop with AAAI 2019.
Wan, S., Lan, Y., Xu, J., Guo, J., Pang, L., Cheng, X., 2016. Match-srnn: Modeling the recursive matching structure with spatial RNN, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 2922–2928.
Wang, S., Jiang, J., 2016. Learning natural language inference with LSTM, in: Proceedings of NAACL HLT 2016, pp. 1442–1451.
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean, J., 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144. arXiv:1609.08144.
Wu, Y., Wu, W., Xing, C., Zhou, M., Li, Z., 2017. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pp. 496–505. doi:10.18653/v1/P17-1046.
Yan, R., Song, Y., Wu, H., 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system, in: Proceedings of SIGIR 2016, pp. 55–64. doi:10.1145/2911451.2911542.
Yoshino, K., Hori, C., Perez, J., D’Haro, L.F., Polymenakos, L., Gunasekara, C., Lasecki, W.S., Kummerfeld, J., Galley, M., Brockett, C., Gao, J., Dolan, B., Gao, S., Marks, T.K., Parikh, D., Batra, D., 2018. The 7th dialog system technology challenge. arXiv preprint .
Zhang, Z., Li, J., Zhu, P., Zhao, H., Liu, G., 2018. Modeling multi-turn conversation with deep utterance aggregation, in: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pp. 3740–3752.
Zhou, G., Luo, P., Cao, R., Lin, F., Chen, B., He, Q., 2017. Mechanism-aware neural machine for dialogue response generation, in: Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, pp. 3400–3407.
Zhou, X., Dong, D., Wu, H., Zhao, S., Yu, D., Tian, H., Liu, X., Yan, R., 2016. Multi-view response selection for human-computer conversation, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, pp. 372–381.
Zhou, X., Li, L., Dong, D., Liu, Y., Chen, Y., Zhao, W.X., Yu, D., Wu, H., 2018. Multi-turn response selection for chatbots with deep attention matching network, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp. 1118–1127.
Zhu, Y., Kiros, R., Zemel, R.S., Salakhutdinov, R., Urtasun, R., Torralba, A., Fidler, S., 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, in: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 19–27. doi:10.1109/ICCV.2015.11.