Reading comprehension is one of the fundamental skills for human, which one learn systematically since the elementary school. Reading comprehension give human the ability of reading texts, understanding their meanings,and with the help of given context answering questions. When machines are required to comprehend texts, they first need to understand the unstructured text and do reasoning based on given text (Chen et al., 2016)(Wang et al., 2018b).Answering questions based a passage requires an individual unique skill set. It requires ability to perform basic mathematical operations and logical ability (e.g. to answer questions like how many times Amit visited sweet shop?), look-up ability, ability to deduce, ability to gather information contained in multiple sentences and passages. This diverse and unique skill set makes question answering a challenging task.There are several variants of this task, For example, if we have a given passage and a question, the answer could either (i) be generated from the passage (ii) match some span in the passage (iii) or could be one of the n number of given candidate answers. The last variant is mostly used in various high school, quiz , middle school, and different competitive examinations. This variant of Reading Comprehension generally referred as Reading Comprehension with Multiple Choice Questions (RC-MCQ).In the given figure 1 We have a passage and a question and 4 candidate answers. Task here defined is to find the most suitable answer from the passage for given question. While answering such Multiple Choice Questions (MCQs) figure 1, humans typically use a combination of option elimination and option selection or sometimes they find answer from the passage i.e they generate the answer of the question from passage and match the generated answer with given options and they choose more close candidate as correct answer.
Here we proposed model which mimic the answer generation and then matching human process.First the span where possible answer in the passage is computed. we first compute a question-aware representation of the passage (which essentially tries to retain portions of the passage which are only relevant to the question). Then we use answer generation using state-of-art S-Net model (Tan et al., 2017)which extract and generate answer figure 2. After we have answer generated from the passage now we weight every given candidate option and select the best matched option. That best matched option was our answer figure 3.
Figure 1: An example multiple-choice reading comprehension question.
Figure 2: Overview of S-Net.(Tan et al., 2017)
Datasets played an important role in machine reading comprehension, there were different type of datasets designed to solve different variant of machine reading comprehension. SQuAD dataset(Rajpurkar et al., 2016) was designed to answer simple question answer reading comprehension that aims to answer a question with exact text spans in a passage. Later MS-MACRO dataset(Nguyen et al., 2016) was designed for multi-passage reading comprehension. CNN/ Dailymail (Chen et al., 2016) and Who did what dataset(Onishi et al., 2016) designed for cloze variant reading comprehension. MCtest(Richardson et al., 2013) and RACE dataset(Lai et al., 2017) are released for Multiple choice question variant reading comprehension.
Similar work in reading comprehension where Multiple choice variant of Comprehension considered includes Hierarchical Attention Flow model(Zhu et al., 2018), in this model the candidate options leverage to model the interaction between question options and passage.This was a option selection model which select the correct option from the given candidate options. Other work relatable to this paper was eliminating options model(Parikh et al., 2019) which eliminate the wrong answer from the candidate answer.Multi matching network(Tang et al., 2019) models interaction relationship between passage, questions and candidate answer. It take different paradigm of matching into account. Option comparison Network (Ran et al., 2019) compares between options at word level and identify correlation to help buildup logic and reasoning. Co-matching model (Wang et al., 2018a) is used to match between answer and question and passage pair. It match for the relationship between question and answer with the passage. Bidirectional co-matching based model (Zhang et al., 2019) matched passage and question, answer bidirectionally. The Convolutional Spatial Attention (CSA) model (Chen et al., 2019) form the enriched representaion by fully extract the mutual information among the passage, question, and the candidates.
To generate answer several models are there like QANet (Yu et al., 2018) which combined local Convolution with Global Self-Attention and its encoder consist exclusively of convolution and self-attention.Bidirectional Attention Flow model (Seo et al., 2016) use to focus on large span of passage. BIDAF network is a multi stage hierarchical process and use bidirection attention flow to obtain a query-aware context representation. But the reason to use S-Net model as answer generation model because S-Net not only find the answer from the passage but it can also synthesise passage
Figure 3: Overview of option matching and selection.
when required. Some questions are tricky and there answer lies in different span of passage. In such situation S-Net is useful as it remember the past context for longer time as it have GRU as basic component.
There are two tasks needs to be performed in this model. First is Answer extraction and Answer Synthesis/Generation and then option selection. Answer extraction and Generation will be done using state-of-art S-NET model(Tan et al., 2017). S-Net first pull out evidence snippets by matching the question and passage respectively, and then generates the answer by filtering the question, passage, and evidence snippets. consider a passage of word length P, Question
of word length Q, and n options
where n > 1 and word length k. We first convert the words to their word-level embedding and character-level embedding using GLOVE(Pennington et al., 2014).The encoding and embedding layers take in a series of tokens and represent it as a series of vectors. The character-level embeddings are cause by taking the final hidden states of a bi-directional GRU applied to embedding of characters in the token. They then use a bi-directional Gated Recurrent Unit to give rise to new depiction
for questions as well as
for passages too and
for options as well. The embedding matrix is boot only once and not trained in the entire learning process. As shown in Figure 4 S-NET uses the series-to-series model to incorporate the answer with the extracted evidences as features. They first produce the depiction It first produce the depiction
and
of all words in the question and passage respectively. When giving out the answer depiction, it merge the basic word embedding
with some added features
and
to indicate the end and start place of the evidence snippet given out by evidence extraction model.
the position t is the start and end of the evidence span, respectively.
On top of the encoder, S-Net uses GRU with attention as the decoder to produce the answer. At each decoding time step t , the GRU reads the previous word embedding and previous context vector
and finally produced answer.
Figure 4: Answer Synthesis/Generation Model(Tan et al., 2017)
The produced answer will be stored in Answer vector. where a is length of the answer.Figure 3 shows the overview of selection module. The selection module will take the refined answer representation
and computes its bi-linear similarity with each option representation.
where i is the number of option, is generated answer vector,
is option vector and
is a matrix which needs to be learned. We select the option which gives the highest score as computed above. We train the model using the cross entropy loss by normalizing the above scores (using softmax) first to obtain a probability distribution.
Here we discussed about the dataset used to evaluate our model, Training procedure, result comparison and future work.
4.1 Dataset
We evaluate our model on RACE dataset(Lai et al., 2017) Race is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students. Each passage is a JSON file. The JSON file contains fields (i) article: A string, which is the passage (ii) questions: A string list. Each string is a query. There are two types of questions. First one is an interrogative sentence. Another one has a placeholder, which is represented by _. (iii)options: A list of the options list. Each options list contains 4 strings, which are the candidate option. (iv) answers: A list contains the golden label of each query.(v) id: Each passage has a unique id in this dataset. RACE has wide variety of questions like Summarization, Inference, Deduction and Context matching.
Figure 5: Statistic information about Reasoning type in RACE dataset
4.2 Training Procedures and Hyper-parameter
We integrate two different model into once. First we train our model on S-Net. To train model on S-Net we process dataset differently. We only consider passage and question and correct option to train model on S-Net. Later we pass the result on to next stage on our model where we train model using generated answer and all candidate options. To train the model, we used stochastic gradient descent with ADAM optimizer.(Kingma and Ba, 2014) We initialize learning rate with 0.005. Gradients are clipped in L2-norm to no larger than 10. To update model parameter per step,we used A mini-batch of 32 samples. We have created a vocabulary using top 65k words from passage and questions and if a new out-of-vocabulary(OOV) word encountered we add a special token UNK. We use the same vocabulary for the passage, question, and options vector embedding. We tune all our models based on the accuracy achieved on the validation set. We use 300 dimensional Glove embedding (Pennington et al., 2014) for word embedding and word and character encoding.We experiment with both fine-tuning and not fine-tuning these word embedding. We train all our models for upto 80 epochs as we do not see any benefit of training beyond 80 epochs as result were starting recurrence.The hidden state size of all GRU network is 128. We apply dropout (Srivastava et al., 2014)to word embeddings and BiGRU’s outputs with a drop rate of 0.45.
4.3 Results and Future Work
Table 1: Accuracy on test set of RACE-M, RACE-H and RACE. * indicates the results from (Lai et al., 2017) which are trained with 100D pre-trained Glove word embeddings
The Human Ceiling Performance reported by CMU on RACE dataset is 94.2. Our model gives accuracy of 79.6 % on RACE-M 75.4 % on RACE-H and 77.3% on RACE FULL which outperform several other model. Since in this model first answer are generated and then option is selected such model can be used to solve such multiple choice question whose answer option is not present or MCQ with "none of the above" or "No answer" type multiple choice questions.
In this paper, we present the GenNet model for multiple-choice reading comprehension. Specifically, the model uses a combination of Generation and selection to arrive at the correct option. This is achieved by first generating the answer for the questions from the passage and then matching generated answer with the options.At last, the proposed model achieves overall sate-of-the-art accuracy on RACE and significantly outperforms neural network baselines on RACE-M, RACE-H and RACE FULL.As future work, we would like to work towards unanswerable questions or questions where no option matched.
Chen, D., Bolton, J., and Manning, C. D. (2016). A thorough examination of the cnn/daily mail reading comprehension task. arXiv preprint arXiv:1606.02858.
Chen, Z., Cui, Y., Ma, W., Wang, S., and Hu, G. (2019). Convolutional spatial attention model for reading comprehension with multiple-choice questions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6276–6283.
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Lai, G., Xie, Q., Liu, H., Yang, Y., and Hovy, E. (2017). Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683.
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., and Deng, L. (2016). Ms marco: a human-generated machine reading comprehension dataset.
Onishi, T., Wang, H., Bansal, M., Gimpel, K., and McAllester, D. (2016). Who did what: A large-scale person-centered cloze dataset. arXiv preprint arXiv:1608.05457.
Parikh, S., Sai, A. B., Nema, P., and Khapra, M. M. (2019). Eliminet: A model for eliminating options for reading comprehension with multiple choice questions. arXiv preprint arXiv:1904.02651.
Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
Ran, Q., Li, P., Hu, W., and Zhou, J. (2019). Option comparison network for multiple-choice reading comprehension. arXiv preprint arXiv:1903.03033.
Richardson, M., Burges, C. J., and Renshaw, E. (2013). Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 193–203.
Seo, M., Kembhavi, A., Farhadi, A., and Hajishirzi, H. (2016). Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958.
Tan, C., Wei, F., Yang, N., Du, B., Lv, W., and Zhou, M. (2017). S-net: From answer extraction to answer generation for machine reading comprehension. arXiv preprint arXiv:1706.04815.
Tang, M., Cai, J., and Zhuo, H. H. (2019). Multi-matching network for multiple choice reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7088–7095.
Wang, S., Yu, M., Chang, S., and Jiang, J. (2018a). A co-matching model for multi-choice reading comprehension. arXiv preprint arXiv:1806.04068.
Wang, Y., Li, R., Zhang, H., Tan, H., and Chai, Q. (2018b). Using sentence-level neural network models for multiple- choice reading comprehension tasks. Wireless Communications and Mobile Computing, 2018.
Yu, A. W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., and Le, Q. V. (2018). Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541.
Zhang, S., Zhao, H., Wu, Y., Zhang, Z., Zhou, X., and Zhou, X. (2019). Dual co-matching network for multi-choice reading comprehension. arXiv preprint arXiv:1901.09381.
Zhu, H., Wei, F., Qin, B., and Liu, T. (2018). Hierarchical attention flow for multiple-choice reading comprehension. In Thirty-Second AAAI Conference on Artificial Intelligence.