b

DiscoverSearch
About
My stuff
Exploring Neural Models for Parsing Natural Language into First-Order Logic
2020·arXiv
Abstract
Abstract

Semantic parsing is the task of obtaining machine-interpretable representations from natural language text. We consider one such formal representation - First-Order Logic (FOL) and explore the capability of neural models in parsing English sentences to FOL. We model FOL parsing as a sequence to sequence mapping task where given a natural language sentence, it is encoded into an intermediate representation using an LSTM followed by a decoder which sequentially generates the predicates in the corresponding FOL formula. We improve the standard encoder-decoder model by introducing a variable alignment mechanism that enables it to align variables across predicates in the predicted FOL. We further show the effectiveness of predicting the category of FOL entity - Unary, Binary, Variables and Scoped Entities, at each decoder step as an auxiliary task on improving the consistency of generated FOL. We perform rigorous evaluations and extensive ablations. We also aim to release our code as well as large scale FOL dataset along with models to aid further research in logic-based parsing and inference in NLP.

Semantic parsing aims at mapping natural language to structured meaning representations. This enables a machine to understand unstructured text better which is central to many tasks requiring natural language understanding such as question answering (Berant et al., 2013; Pasupat and Liang, 2015), robot navigation (MacMahon et al., 2006; Artzi and Zettlemoyer, 2013), database querying (Zelle and Mooney, 1996) etc. For question answering, natural language question is converted to formal semantics which facilitates interaction with a knowledge base (such as FreeBase (Bollacker et al., 2008)) for retrieving concise answers (Fur- bach et al., 2010). Such representations can be used to specify instructions to robots (Artzi and Zettle- moyer, 2013) or conversational agents (Artzi and Zettlemoyer, 2011) for executing desired action(s) in an environment. Similarly, natural language queries are transformed into executable database programming language instructions (such as SQL) to retrieve or generate correct results in a database (Sun et al., 2018; Zhong et al., 2017).

A variety of logical forms and meaning representations have been proposed for text. These include graph-based formalisms (Banarescu et al., 2013; Abend and Rappoport, 2013; Oepen et al., 2014; Kollar et al., 2018) where text is represented as a typed graph. The entities and action events are represented as nodes with labeled edges depicting relations between them. Semantic dependency tree (Oepen et al., 2014) is a directed graph depicting the syntactic structure of a sentence in the form of modifier relations between its words. AMR (Abstract Meaning Representation) graphs (Banarescu et al., 2013) use variables to annotate nodes following neo-Davidsonian style (Davidson, 1969). Lambda Dependency-based Compositional Semantics (λ-DCS) (Liang, 2013) was proposed as a formal language adapting Dependency-Based Compositional Semantics (Liang et al., 2013) borrowing the expressiveness of lambda calculus (Barendregt et al., 1984) but aiming to remove explicit use of variables.

In this work, we focus on first-order logic (FOL) (Smullyan, 2012) as the language formalism for text. FOL represents entities and actions in natural language through quantified variables and consists of functions (called predicates) which take variables as arguments. The predicates attach semantics to variables and express relations between objects (Blackburn, 2005). For instance, a simple sentence - “a man is eating” can be represented through FOL as

image

Advanced natural language concepts as in sentence “the man and woman are seated facing each other” can be expressed as

image

where “man” and “woman” are represented together through shared variable C and “facing each other” is represented by negating the existence of a thing E for which C is not facing E holds true.

The success of learning based neural approaches in NLP tasks like machine translation (Cho et al., 2014; Sutskever et al., 2014; Vaswani et al., 2017), paraphrase generation (Prakash et al., 2016; Gupta et al., 2018), dialog modeling (Vinyals and Le, 2015; Kottur et al., 2017), machine comprehension (Wang et al., 2017), logical inference (Kim et al., 2019) has motivated their use for semantic parsing (Koˇcisk`y et al., 2016; Buys and Blunsom, 2017; Cheng et al., 2017; Liu et al., 2018; Li et al., 2018) as well. Many such works use the encoder-decoder framework to model it as a sequence transduction task. Since they were designed for solving specific tasks like question answering, such methods (Jia and Liang, 2016; Dong and Lapata, 2016) have mainly focussed on confined logical formalism for specific domains such as flight reservation, restaurant booking, etc (Wang et al., 2015) capturing limited vocabulary and semantic concepts.

In this paper, we aim at developing a general-purpose open-domain neural first-order logic parser for natural language sentences to examine the capabilities of such models. We train our model by obtaining a large corpus of text-FOL pairs for sentences in SNLI Dataset (Bowman et al., 2015) through C&C parser (Clark and Curran, 2007) and Boxer (Bos, 2008) (discussed later in detail).1 Apart from meaning depiction, parsing sentences to FOL would enable neural models to capture complex relationships between entities resulting in richer embeddings which might be useful in several other NLP tasks. Such an examination would help understand challenges in generating FOL through neural approaches owing to complexities in its representation. Since it is one of the first such exploration for FOL, we treat the popular sequence to sequence model coupled with attention mechanism (Bahdanau et al., 2014) as our baseline. We propose to disentangle the prediction of different types of FOL syntactic entities (unary and binary predicates, variables etc) while parsing sentences and show improvements through performing category type prediction as an auxiliary task. We further show major improvements by explicitly constraining the decoder to align variables across unary and binary predicates. This restricts the model to maintain consistency while expressing standalone entity attributes and relations between them.

Our contributions can be enumerated as: 1) We explore and develop an open domain neural semantic parser to parse natural language sentences to FOL using Seq2Seq framework; 2) We propose disentangled FOL entity type prediction along with FOL parsing under multi-task learning and FOL variable alignment through decoder alignment mechanism. We perform extensive ablation studies to establish the improvements registered; 3) We also aim to release our code, models and large scale dataset used comprising of sentence-FOL mappings to aid further research in FOL based NLP.

Text to FOL Conversion : In this section, we give a brief overview of syntactic-semantic analysis pipeline used for obtaining the mappings data through Boxer (Bos, 2008) based on Combinatory Categorial Grammar (CCG) (Steedman and Baldridge, 2011) and Discourse Representation Theory (DRT) (Kamp et al., 2011). CCG is phraselevel grammar which defines rules for generating constituency-based structures. CCG comprises of syntactically typed lexical items such that each item is a lambda-expression and uses combinatorial logic (lambda calculus) to combine them through the application of combinators. CCG derivation guides semantic composition to obtain Discourse Representation Structures (DRS) from CCG parses. DRS comprises of discourse referents and conditions defined on them which can be recursive. DRS is capable of representing varied linguistic phenomena such as anaphora, presupposition, tense and aspect. These DRSs are compatible and can be converted to FOL through a set of syntactic trans- formations (Bunt et al., 2001). Formally, predicates in FOL are atomic formulas that are combined through logical connectives - logical and (∧), logi-cal or (∨) ; and quantifiers. In general, a predicate P(v1, v2, ..., vn)is an n-ary function of variables. There are two types of quantifiers, universal (∀) -which specifies that sub-formula within its scope is true for all instances of the variable and existential (∃) - which asserts existence of at least one instance represented by a variable under which the sub-formula holds true. For example, “All humans eat” can be represented as

image

Following generalized De Morgan’s law (John- stone, 1979), universal quantifiers can be represented through existential quantification and negation (not) preserving the semantics as

not(∃A(not(∃B(human(A) ∧ eat(B) ∧agent(B, A)))))

Output and Mapping Format : Given a text sentence, we obtain the following FOL output.

Sentence : “three women are traveling by foot”

Output FOL : fol(1,some(A,some(B,some(C,and (r1by(B,A),and(n1foot(A),and(r1agent(B,C),and (v1travel(B),and(n1woman(C),some(D,and(card

(C,D),and(c3number(D),n1numeral(D)))))))))))))

Here, the predicates are prefixed with POS-tags (Wilks and Stevenson, 1998) and relation types. Since the output FOL comprises of existential quan-tifiers and disjunction of atomic formulas only, we convert it into an equivalent mapping as a sequence of predicates, argument variables, scoping symbols (such as “fol(”, “)”, “not(”) and train our models to predict the sequence. We arrange scope symbols in accordance to their nesting level (top most appearing first in the sequence) with further ordering that entities that are part of same scope are arranged as sequence of unary predicates, followed by binary predicates and other nested scoped entities.

Equivalent Mapping : fol( n1foot A v1travel B n1woman C c3number D n1numeral D r1by B A r1agent B C card C D )

We model parsing a given sentence into FOL as a sequence to sequence transduction problem. Our parser P generates a token in the output FOL representation in a sequential manner by greedily sampling it from a probability distribution conditioned on the input sentence and the previously generated tokens. Our input X consists of a sequence of m tokens {x0, x1, ..., xm}which get encoded into hidden contextual representations by an Encoder. The Decoder, then, generates an output sequence of  n tokens {y0, y1, ..., yn}.

image

3.1 Encoder

Our Encoder E is a bidirectional LSTM (biLSTM) which encodes a sequence of input tokens X : {x0, x1..., xm}into a sequence of hidden states He : {he0, he1, ..., hem}, hei ∈ Rdh to capture contextual information from the input that is eventually used by the decoder to produce the output FOL sequence. The biLSTM block takes word embeddings for the input tokens  Ee : {ee0, ee1, ..., eem}, eei ∈ RD as input and processes them to calculate the contextual representations

image

where ; denotes concatenation operation and hfei, hbeirefer to the forward and backward hidden states of the biLSTM.

3.2 Decoder

Decoder D consists of an LSTM which uses the outputs of encoder E along with previously decoded outputs, provided as embeddings  Ed :{ed0, ed1, ..., edn}to it as input, to generate a sequence of hidden states  Hd : {hd0, hd1, ..., hdn}.

image

Attention (Bahdanau et al., 2014) has now become ubiquitous in sequence to sequence models. We consider it to be a part of our baseline model. Following (Bahdanau et al., 2014), we calculate the weights for encoder-decoder attention using  Hd asqueries while  He as keysas well as values (eq. 4.

image

The encoder-context vector is obtained by taking a weighted sum of encoder’s hidden states  He (eq.5).

image

image

Figure 1: Overview of our architecture showing separate heads (red), category prediction (orange) and alignment mechanism (green and pink). Input to Decoder LSTM (blue) depicts the output of last step being fed at next step. Red arrow between Attention Layer and Decoder depicts standard encoder-decoder attention.

The hidden state of the decoder  hdialong with encoder-context vector  ceiis used to predict the final output token at  ith step.

image

where  Wo ∈ RV X(dh+dc) is output head and dh isthe dimension of hidden vector and  dc = dh is thedimension of context vector.

We train the model on the standard cross-entropy objective while adopting teacher forcing methodology i.e. giving the inputs to the decoder from ground truth instead of previously decoded tokens while training

image

where  ytiis the target token from ground truth at step i and  θrefers to the trainable model parameters.

3.2.1 Separate Heads

The output tokens in an FOL sequence do not all belong to the same token category unlike majority sequence to sequence translation problems which process words. In particular, the output tokens in an FOL sequence can be divided into four major types - Unary Predicates U, Binary Predicates B, Variables V , and Scoped Entities S. We create separate vocabularies of sizes  Vu, Vb, Vv, and  Vsfor each category. Apart from variables V which have one-hot embedding, all other types of output tokens have dense embeddings. This is because

a token of category V does not posses semantic meaning that is shared across all sequences from the output distribution. Thus, they are defined in the context of an FOL sequence only. We represent them through one-hot embeddings to ensure independence between them. Building on above motivation, we use five different heads on top of Decoder LSTM. While one head T decides what type of token is being generated at a given decoding step, the other heads decode the probabilities of different types of tokens.

image

where  Wx ∈ RVxXdhand x : {u, b, v, s}. We also treat different categories as words in a vocabulary of size  Vcand therefore,  Wt ∈ RVcXdhWe, thus, train the model on an additional auxiliary task of predicting the type of the token being generated at each step. Hence, the overall cross-entropy objective to decode the correct type at all steps becomes

image

where  ttiis the target type (from ground truth) of token to be predicted at this step and the probability of generating token  yiis given by  ox[ti]i . φrefers to additional decoder parameters introduced in the model. Thus, our overall objective is now a sum of both cross entropy and auxiliary objective:

image

3.3 Decoder Self Attention

One of the key challenges for the model is to identify the relationship between the variables it generates. A variable A that is an argument in a binary predicate should be aligned with the same variable used as an argument in a unary predicate previously. One of the ways to achieve such alignment is through decoder self-attention which is an extension of the regular encoder-decoder attention. In this case, queries, keys and values - all are decoder hidden states  Hd. Therefore, we determine decoder context vector  cdalong with the encoder-context vector. However, one of the key differences between encoder-decoder and decoder self-attention is that while encoder-decoder attention can be applied to the whole input, decoder self-attention can only be applied on the hidden states which have been decoded so far. Just like encoder-context, decoder context is calculated by taking a weighted sum of decoder hidden states

image

The linear head on the decoder now uses both encoder and decoder contexts along with decoder hidden state to generate the final output

image

where  Wo ∈ Rdh+2∗dc

3.3.1 Alignment Mechanism

Through decoder self attention, the model does not receive any explicit signal on alignment and relies only on cross-entropy objective to identify such relations between different variables. In order to provide an explicit signal to the model, we introduce Alignment Mechanism.

At each variable decoding step, along with decoding the type of the token i.e. variable, a linear classifier makes the decision whether this variable is aligned with any previously decoded token/variable or is an entirely new variable being generated at this step

image

where  Wali ∈ R1Xdh. Depending on this decision by the classifier, an alignment mechanism similar to decoder self-attention performs the relational mapping between a previously decoded variable and the variable which is currently being decoded. This mapping is performed only for the variables and not for any other category. All the previously generated hidden states of the decoder are linearly projected into a different space before calculating the position of the token with which the variable is aligned. The projection is performed to reduce the interference with encoder-decoder attention due to alignment mechanism training.

image

The probabilities of whether a particular step j aligns with the current decoding step i is calculated with an attention-like formulation.

image

For every other category of tokens but variables, the decoder heads remain the same. However, for variables, we first calculate aligned hidden state value as

image

The output is, then, calculated as

image

In order to provide explicit signal during training, we train  γ and Adion the target alignment positions and decisions with a cross entropy objective

image

image

(25) where  Atdiand  γtijrefer to ground truth decision and alignment position values and  ζrefers to additional parameters introduced due to alignment mechanism. Therefore our overall loss becomes

image

4.1 Dataset

We collated a subset of SNLI (Bowman et al., 2015) corpus by extracting sentences from both premise and hypothesis for a limited number of examples. Eliminating duplicates, we prepared (Refer Section 2) two versions of the dataset - Small and Large to examine if the proposed improvements remain consistent even on small data. In the smaller version, we prepared 138,346 instances while in the larger one, we prepared 255,501 instances for training. We used the development and test sets of SNLI as provided but eliminated the duplicates resulting in evaluation set having 10,691 instances and test set having 10,633 instances.

4.2 Implementation

We used Pytorch2 library for implementing an auto-differentiable graph of our computations. All the models were trained with an Adam Optimizer(Kingma and Ba, 2014) initialized with a learning rate of 0.001 with a decay rate of  10−4. We use an embedding size D = 100 for encoder as well as decoder embeddings in the baseline model. In our separated heads model, D remained the same for encoder embeddings. However, on the decoder side, Unary and Binary predicates have an embedding size of 100 each while variable and type embeddings are one-hot having the number of dimensions equal to their respective vocabulary sizes. Scoped entities, being very less in number, were encoded with an embedding size of 50. Our final input embedding is a concatenation of Unary, Binary, Variable, Scope and type embeddings. All dense embeddings are randomly initialized and trained from scratch.3 We used  dh = dc = 400, m = 100and n = 30.

4.3 Results and Discussion

4.3.1 Evaluation Framework

We evaluate different models through estimating the accuracy of complete match between gold standard FOL and predicted output. Due to the complex nature of the task, it is less likely that the model generates exactly the same FOL. To mitigate this, we propose to evaluate the degree of partial match between two FOLs following the intuition behind Dmatch and Smatch (Cai and Knight, 2013), which are widely used to evaluate AMR graphs and DRGs. We align two FOLs in bottom up manner beginning with variables. For aligning two variables, it is required that the corresponding predicates’ name (in which they appear as arguments) and argument positions match. Subsequently, while aligning two predicates, we check if their arguments are aligned and their names are same. We continue to follow the same process where we align nested scope symbols (“not(” etc.). In particular, given an expected scoped entity, we determine the predicted scoped entity having maximum alignment with it based on the count of other aligned predicates and scoped entities that are contained inside them. Given an FOL, we decompose it into related pairs of the form (n1, n2)such that  n2appears inside the scope of n1. For instance, a variable that is an argument in a predicate or a predicate appearing inside a scoped entity. Consequently, we estimate the number of pairs in expected FOL that can be matched with pairs in predicted FOL based on the constraint that corresponding entities in the pairs should be aligned. We select the alignment with maximum matches and report metrics (precision-recall and F1 over pair-matching) as evaluation criteria along with overall FOL accuracy.

4.3.2 Comparison with Baseline and Ablation Studies

We conduct a range of experiments and evaluations on different models. We show our results on both development and test sets in Table 1, 2, 3, and 4. Our Vanilla (Baseline) model consists of a biLSTM Encoder and a plain LSTM decoder as described in Section 3.2 coupled with an encoder-decoder attention mechanism. Performing disentanglement, our Separate Heads model uses different linear heads on the top of LSTM decoder for different category of tokens as discussed in Section 3.2.1. Our final proposed model Separate Heads + Align uses our alignment mechanism on the top of Separate Heads and utilises the disentangled variable prediction mechanism coupled with an alignment mechanism to effectively identify the relationships between variables in binary predicates and their unary counterparts. We also conduct ablations on the Vanilla and Separate Heads models by incrementally adding both decoder self-attention and alignment mechanisms.

Evidently, our final model Separate Heads + Align convincingly outperforms all described models and improves the baseline by  ∼ 8F-1 points. Decoder self-attention, even though, improves Vanilla Model does not provide any improvements when used with Separate Heads. This can be attributed to its inability to incorporate decoder level information which probably becomes factorized automatically during training through using separate heads. However, it provides improvements over Vanilla by a good margin but still only matches or remains inferior to the standalone Separate Heads model. Align Mechanism manages to provide a huge boost to the Separate Heads model by improving it by  ∼ 5F-1 points. However, performance deteriorates when used with Vanilla model since its ability to align variables only vanishes in this setup which we find critical for its working. We further note that by increasing the size of training data, the performance increases uniformly with our final model achieving the best F-1 of  ∼ 73 andan overall accuracy of  ∼ 63%.

image

Table 1: Results showing overall accuracy and F1-Scores of different models (trained on Large dataset) on development dataset

4.3.3 Analysis

We perform additional experiments to analyse the results observed. We conduct two sets of analysis - Variation of F-1 score with input length and Perturbed training to establish the robustness of our proposed method.

image

Table 2: Results showing overall accuracy and F1-Scores of different models (trained on Large dataset) on Test dataset

image

Table 3: Results showing overall accuracy and F1-Scores of different models (trained on Small Dataset) on development dataset

image

Table 4: Results showing overall accuracy and F1-Scores of different models (trained on Small Dataset on test dataset

image

Figure 2: Variation of output F-1 Score with input length on Test Dataset

Variation with Input Length: Evidently, our proposed models are relatively much more robust to increase in length in the input sentence as shown in Fig. 2. This can be attributed to many factors -increased model capacities as well as their abilities to process different categories of output tokens separately giving better long range dependencies and less confusion in generating many variables over FOL owing to better alignment across the sequence.

Perturbed Training: It has been noticed in litera-

image

Table 5: Results on Test set showing both accuracy and F1-Scores with perturbed training

ture ((Jia and Liang, 2017; Niven and Kao, 2019) that neural models sometimes exploit trivial patterns in outputs/inputs to fool and provide pseudoimproved results. One such pattern could be presence of variables like A and B with some specific unary and binary predicates. In order to disturb such patterns, we randomly permute the presence of such variables in the ground truth during training. Our baseline model indeed shows a significant drop in results (Table 5). On the other hand, our other two main models do not show such large drop proving their robustness to such disturbances.

Early semantic parsers were majorly rule based (Johnson, 1984; Woods, 1973; Thompson et al., 1969) using grammar systems (Waltz, 1978; Hen- drix et al., 1978), employing shallow pattern matching (Johnson, 1984) and parse tree to generate database query language (Woods, 1973). These were succeeded by data driven learning techniques which use language data paired with meaning representations and can be broadly classified into statistical methods (Thompson, 2003; Zettle- moyer and Collins, 2012; Zelle and Mooney, 1996; Kwiatkowski et al., 2010) and neural approaches (Koˇcisk`y et al., 2016; Dong and Lapata, 2016; Jia and Liang, 2016; Buys and Blunsom, 2017; Cheng et al., 2017; Liu et al., 2018; Li et al., 2018). (Zettlemoyer and Collins, 2012) proposed to use sentences and their lambda calculus expressions to learn a log linear model through probabilistic CCG to rank different parses for a given sentence using simple features such as count of lexical entries. Additionally, captioned videos have been used to perform visually grounded semantic parsing (Ross et al., 2018). Feedback based semantic parsing has been done to facilitate continuous improvement in the quality of parse through conversations (Artzi and Zettlemoyer, 2011) and user interaction (Iyer et al., 2017; Lawrence and Riezler, 2018).

Neural approaches alleviate the need for manually defining lexicons and can further be categorized based on the structure of parse into sequen- tial parse prediction (Jia and Liang, 2016; Koˇcisk`y et al., 2016) and graph structure decoding which tailor network architecture to utilize the syntactic structure of meaning representation. (Yin and Neubig, 2017; Alvarez-Melis and Jaakkola, 2016; Rabinovich et al., 2017). (Dong and Lapata, 2016) proposed SEQ2TREE to generate domain-specific hierarchical logical form by introducing parenthesis token and parent connections to recursively generate sub-trees. (Rabinovich et al., 2017) introduced a dynamic decoder whose components are composed depending on generated tree parse. (Liu et al., 2018) parse DRSs using dedicated hierarchical decoders to generate partial structure first before the semantic content. We instead make the model disambiguate syntactic types (unary and binary predicates, variables, scope symbols) through performing category type prediction as an auxiliary task and using separate prediction heads. Constrained decoding using target language syntax and grammar rules has been explored (Yin and Neu- big, 2017; Xiao et al., 2016). Copy-mechanism (Gu et al., 2016) has been used to facilitate the generation of out of vocabulary entities through encoder attention (Jia and Liang, 2016). However, our variable alignment mechanism is different since it constrains the model to align binary predicate arguments with previously generated unary structures (alignment happening at decoder level) through specifying explicit loss on whether to align and where to align.

In this work, we examined the capability of neural models on the task of parsing First-Order Logic from natural language sentences. We proposed to disentangle the representations of different token categories while generating FOL output and used category prediction as an auxiliary task. We utilized token factorization to build an alignment mechanism which effectively manages to capture the relationship between variables across different predicates in FOL. Our analysis showed the diffi-culties faced by neural networks in modeling FOL and ways to tackle them. We also experimented by introducing a perturbation in inputs in order to examine the robustness of different proposed models. In a bid to promote research further in the area, we aim to release our code as well as data publicly.

Omri Abend and Ari Rappoport. 2013. Universal con- ceptual cognitive annotation (ucca). In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 228–238.

David Alvarez-Melis and Tommi S Jaakkola. 2016. Tree-structured decoding with doubly-recurrent neural networks.

Yoav Artzi and Luke Zettlemoyer. 2011. Bootstrapping semantic parsers from conversations. In Proceedings of the conference on empirical methods in natural language processing, pages 421–432. Association for Computational Linguistics.

Yoav Artzi and Luke Zettlemoyer. 2013. Weakly su- pervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1:49–62.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 178–186.

Hendrik P Barendregt et al. 1984. The lambda calculus, volume 3. North-Holland Amsterdam.

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544.

Patrick Blackburn. 2005. Representation and inference for natural language: A first course in computational semantics.

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. AcM.

Johan Bos. 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference on Semantics in Text Processing, pages 277–286. Association for Computational Linguistics.

Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326.

H Bunt et al. 2001. Patrick blackburn, johan bos, michael kohlhase and hans de nivelle.

Jan Buys and Phil Blunsom. 2017. Robust incremen- tal neural semantic graph parsing. arXiv preprint arXiv:1704.07092.

Shu Cai and Kevin Knight. 2013. Smatch: an evalua- tion metric for semantic feature structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752.

Jianpeng Cheng, Siva Reddy, Vijay Saraswat, and Mirella Lapata. 2017. Learning structured natural language representations for semantic parsing. arXiv preprint arXiv:1704.08387.

Kyunghyun Cho, Bart Van Merri¨enboer, Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Stephen Clark and James R Curran. 2007. Widecoverage efficient statistical parsing with ccg and log-linear models. Computational Linguistics, 33(4):493–552.

Donald Davidson. 1969. The individuation of events. In Essays in honor of Carl G. Hempel, pages 216– 234. Springer.

Li Dong and Mirella Lapata. 2016. Language to log- ical form with neural attention. arXiv preprint arXiv:1601.01280.

Ulrich Furbach, Ingo Gl¨ockner, Hermann Helbig, and Bj¨orn Pelzer. 2010. Logic-based question answering. KI-K¨unstliche Intelligenz, 24(1):51–55.

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.

Ankush Gupta, Arvind Agarwal, Prawaan Singh, and Piyush Rai. 2018. A deep generative framework for paraphrase generation. In Thirty-Second AAAI Conference on Artificial Intelligence.

Gary G Hendrix, Earl D Sacerdoti, Daniel Sagalowicz, and Jonathan Slocum. 1978. Developing a natural language interface to complex data. ACM Transactions on Database Systems (TODS), 3(2):105–147.

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. arXiv preprint arXiv:1704.08760.

Robin Jia and Percy Liang. 2016. Data recombination for neural semantic parsing. arXiv preprint arXiv:1606.03622.

Robin Jia and Percy Liang. 2017. Adversarial exam- ples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328.

Tim Johnson. 1984. Natural language computing: the commercial applications. The Knowledge Engineering Review, 1(3):11–23.

Peter T Johnstone. 1979. Conditions related to de mor- gan’s law. In Applications of sheaves, pages 479– 491. Springer.

Hans Kamp, Josef Van Genabith, and Uwe Reyle. 2011. Discourse representation theory. In Handbook of philosophical logic, pages 125–394. Springer.

Seonhoon Kim, Inho Kang, and Nojun Kwak. 2019. Semantic sentence matching with denselyconnected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6586–6593.

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Tom´aˇs Koˇcisk`y, G´abor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, and Karl Moritz Hermann. 2016. Semantic parsing with semi-supervised sequential autoencoders. arXiv preprint arXiv:1609.09315.

Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, and Spyros Matsoukas. 2018. The alexa meaning representation language. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 177–184.

Satwik Kottur, Xiaoyu Wang, and V´ıtor Carvalho. 2017. Exploring personalized neural conversational models. In IJCAI, pages 3728–3734.

Tom Kwiatkowski, Luke Zettlemoyer, Sharon Gold- water, and Mark Steedman. 2010. Inducing probabilistic ccg grammars from logical form with higherorder unification. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 1223–1233. Association for Computational Linguistics.

Carolin Lawrence and Stefan Riezler. 2018. Improv- ing a neural semantic parser by counterfactual learning from human bandit feedback. arXiv preprint arXiv:1805.01252.

Zuchao Li, Jiaxun Cai, Shexia He, and Hai Zhao. 2018. Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3203–3214.

Percy Liang. 2013. Lambda dependency-based compo- sitional semantics. arXiv preprint arXiv:1309.4408.

Percy Liang, Michael I Jordan, and Dan Klein. 2013. Learning dependency-based compositional semantics. Computational Linguistics, 39(2):389–446.

Jiangming Liu, Shay B Cohen, and Mirella Lapata. 2018. Discourse representation structure parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 429–439.

Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: Connecting language, knowledge, and action in route instructions. Def, 2(6):4.

Timothy Niven and Hung-Yu Kao. 2019. Probing neu- ral network comprehension of natural language arguments. arXiv preprint arXiv:1907.07355.

Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Dan Flickinger, Jan Hajic, Angelina Ivanova, and Yi Zhang. 2014. Semeval 2014 task 8: Broad-coverage semantic dependency parsing. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 63–72.

Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. arXiv preprint arXiv:1508.00305.

Aaditya Prakash, Sadid A Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, and Oladimeji Farri. 2016. Neural paraphrase generation with stacked residual lstm networks. arXiv preprint arXiv:1610.03098.

Ella Rabinovich, Noam Ordan, and Shuly Wintner. 2017. Found in translation: Reconstructing phylogenetic language trees from translations. arXiv preprint arXiv:1704.07146.

Candace Ross, Andrei Barbu, Yevgeni Berzak, Bat- tushig Myanganbayar, and Boris Katz. 2018. Grounding language acquisition by training semantic parsers using captioned videos. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2647–2656.

Raymond R Smullyan. 2012. First-order logic, volume 43. Springer Science & Business Media.

Mark Steedman and Jason Baldridge. 2011. Combinatory categorial grammar. Non-Transformational Syntax: Formal and explicit models of grammar, pages 181–224.

Yibo Sun, Duyu Tang, Nan Duan, Jianshu Ji, Gui- hong Cao, Xiaocheng Feng, Bing Qin, Ting Liu, and Ming Zhou. 2018. Semantic parsing with syntax- and table-aware SQL generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 361–372, Melbourne, Australia. Association for Computational Linguistics.

I Sutskever, O Vinyals, and QV Le. 2014. Sequence to sequence learning with neural networks. Advances in NIPS.

Cynthia Thompson. 2003. Acquiring word-meaning mappings for natural language interfaces. Journal of Artificial Intelligence Research, 18:1–44.

Frederick B Thompson, PC Lockemann, B Dostert, and RS Deverill. 1969. Rel: A rapidly extensible language system. In Proceedings of the 1969 24th national conference, pages 399–417. ACM.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.

Oriol Vinyals and Quoc Le. 2015. A neural conversa- tional model. arXiv preprint arXiv:1506.05869.

David L Waltz. 1978. An english language question an- swering system for a large relational database. Communications of the ACM, 21(7):526–539.

Bingning Wang, Kang Liu, and Jun Zhao. 2017. Condi- tional generative adversarial networks for commonsense machine comprehension. In IJCAI, pages 4123–4129.

Yushi Wang, Jonathan Berant, and Percy Liang. 2015. Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1332–1342.

Yorick Wilks and Mark Stevenson. 1998. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering, 4(2):135–143.

William A Woods. 1973. Progress in natural language understanding: an application to lunar geology. In Proceedings of the June 4-8, 1973, national computer conference and exposition, pages 441–450. ACM.

Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence-based structured prediction for semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1341– 1350.

Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696.

John M Zelle and Raymond J Mooney. 1996. Learn- ing to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, pages 1050–1055.

Luke S Zettlemoyer and Michael Collins. 2012. Learn- ing to map sentences to logical form: Structured classification with probabilistic categorial grammars. arXiv preprint arXiv:1207.1420.

Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.


Designed for Accessibility and to further Open Science