of year 2011” mentions the database column “best actor 2011”. Clearly, we cannot rely on exact string matching to detect mentions.
star in? (Example (a) in Figure 1). Here, “Piotr Adamczyk” is an actor, but the question does not mention the database column “actor” explicitly. We need to infer the column from the question, the database schema, and the database statistics.
“2” and “3” could refer to either “rebounds” or “points.” The correct resolution depends on the syntax, or the context of the question.
Next, we describe meta knowledge of a database, which is the minimal knowledge about the underlying database we need in order to understand a natural language question against the database. Then we describe annotation as a two-stage process consisting of mention detection and mention resolution. The first stage detects a set of many possible mentions of column and values, in which some mentions may be inconsistent with others since certain possibilities are mutually exclusive. The second stage finds a maximum subset of these mentions that is consistent, which constitutes the output annotation.
Meta Knowledge of a Database
We rely on the meta knowledge of a database to understand questions posed against the database. The meta knowledge contains the following:
1. Database schema C. The database schema is the definition of the columns of a database table. For example (a) in Figure 1, C = {Nomination, Actor, Film Name, Director, Country}.
2. Database statistics for each column
. Intuitively, a human database operator knows which database columns need to be involved for answering a particular question even if the name of the column is not explicitly mentioned in the question. For example, in Figure 1 (a), we should consult the “Actor” column for “Piotr Adamczyk”. To approximate this mental process, we construct and leverage database statistics
that enables us to measure how likely a phrase is related to database column c for all
. For example, we may create a language model for a column. Specifically, we may use pre-trained wordembeddings to decide if a particular term belongs to a particular column (the idea is that if a term is related to a column, its embedding should be close to the word-embedding space of values in the column).
3. Natural language expressions specific to a database, a database column, and values of the column . We need to know how people talk about things embodied by the database. Ideally, if we have a general purpose ontology that tells us everything about how language is used to describe any entity and its features, we might not need this particular handcrafted component in the schema. But since such ontology does not exist, we consider this component requires minimal effort to enable us to support cross domain NLI. Specifically, for column c, we collect a set of phrases
that mention c. For example, for c = population in Figure 1 (b), we have
population of
city
, size of
city
, how many people live in
city
, ...}. Our approach provides a direct way to inject this minimal knowledge to our model.
Mention Detection
In Mention Detection, we derive candidate mentions (of database columns and values) for terms in a question. Formally, we define a term to be a continuous span of words. A term may mention a database column or value. For example, for the question Which film directed by Jerzy Antczak did Piotr Adamczyk star in? in Figure 1 (a), “directed” could be a mention of column “Director” and “Piotr Adamczyk” could be a mention of a value in either the “Director” column or the “Actor” column. Note that a term may have multiple candidate mentions, and the entire set of candidate mentions may not be consistent, and we rely on the next step, mention resolution, to find a consistent set of mentions.
We treat mentions of database values and mentions of database columns differently. For mentions of columns, we have a set of known columns, and each of them can be mentioned in many different ways. For example, the term “directed” may be used to mention the “Director” column, and “how many people live in ...” may be used to mention the “Population” column. Our method, described below, combines syntax and semantics to deal with this scenario. For mentions of database values, however, we do not have a set of ground-truth values. One may ask “When was Joe Biden elected U.S. president?” against a database of U.S. presidents, but clearly, “Joe Biden” is not in the database (at least not yet). Thus, for mentions of values v, we evaluate whether a term, in its exact literal form in the question (denoted as ), is likely to be a value of a column of the database.
We now describe mention detection for columns. For a term, defined as a span of words in a question, we want to know if it is a mention of a column c. We do the following:
• We consider w a possible mention of column c only if the w covers c as a string effectively and efficiently. Formally, w covers c if (1) There does not exist a larger span that contains w (e.g.
s.t.
) and covers more words of c than w; (2) There does not exist a smaller span
that is contained within w(e.g.
s.t.
) and covers the same number of words of c as w. The computation of number of words covered is done by counting the number of pairs (x, y) where
and
such that x and y are close. The measure of closeness could be either of the following:
– Edit Distance. We denote max (|x|, |y|) where D is character-level edit distance, and define x and y are close if
.
– Word-Embedding Space Distance. We denote , and define x and y are close if
.
• We use meta database knowledge directly for mention detection. For example, in Figure 1(b), the phrase “how many people live in” is detected as a mention for column “Population”.
Taking question q = “ Who is the best actress of year 2011?” and column name c = “best actor 2011” as an example. We detect w = “best actress of year 2011” as a mention of c, as “best”, “actress” and “2011” in w have words covered in c, namely “best”, “actress”, “2011”. Note that the closeness of “actress’ and “actor” is assessed by . Moreover, w is neither “the best actress of year 2011?” (violating (1) above) nor “best actress of year” (violating (2) above).
Figure 3: Use lowest common ancestor (LCA) in constituency tree to find correct pairing of column and value. Here “rebounds” is paired with “2” and “points” with “3”.
Mention Resolution
There could be many candidate mentions of columns and values for many terms. The goal of mention resolution is to figure out globally, what is the most likely subset of mentions that are consistent.
First, we use syntax or context to reduce the number of candidate mentions. For example, in question “For which player his rebounds is 2 and points is 3?” the values “2” and “3” could be mentions of either “rebounds” or “points”. Both are valid unless we take into consideration the syntax and context of the question. We observe that in the question’s constituency tree, paired columns and values are structurally close to each other. Therefore, we use the depth of Lowest Common Ancestor (LCA) in the question’s constituency tree as a measure of structural closeness. More specifically, for value v and column c, and their mentions and
in the question respectively, the structure closeness is measured by
where
are words in
respectively. We only consider matches (v, f) that have best structure closeness, e.g. when
.
Second, we use a global metric to further reduce inconsistency. Note that the first step does not solve all the problems. For example, it is possible that multiple column candidates have the same maximal LCA to a given value in the question. We formalize this task as a graph matching problem: Consider a bipartite graph G = (V, C, E) where vertices V represent all mentions of values, vertices C all mentions of columns, and edges E contain edges between two vertices if (v, c) is considered as a possible match as described above. We then find a Maximum Bipartite Matching (MBM) Kuhn (1955) in G as a proxy to the mention resolution, since finding MBM is equivalent to finding a maximum subset of matchings that are compatible, where each edge (v, c) in MBM constitutes a matching of value and column in the output annotation. After finding MBM, we add all values and columns whose corresponding vertex is without any edge in G to the annotation, since adding them would not disturb any existing matching. Finally, all columns and values are deterministically given an index (
) where (1) matching column and value share the same index, and (2) indices are ordered according to the earliest mention of indices’ corresponding column or value in the question.
We denote an input natural language question as q = , annotated question as
, and the corresponding annotated SQL query as
=
. The result of the aforementioned annotation process on a natural language question q is represented as its annotated form
, which is the input to a seq2seq model that translates
to an annotated SQL
. In this section we discuss the representation of the annotated input, followed by a Sequence Translation Model that produces
for a given
.
Representation of Annotated Input
There are many options in representation. For example, in Figure 1(a), “directed by” is annotated as the mention of . We can either substitute “directed by” by
or insert
after “directed by” in the question. We explore difference annotation encoding methods and propose our unique annotation encoding to separate schema from natural language questions without loss of schema information. We will show in the Experiment Section that the representation affects the schema separation performance dramatically.
Question What position did the player LeBron James play? Annotated Question: Symbol Substitution Annotated Question: Symbol Appending
Figure 4: An example with two format of annotated form.
Include annotations as additional symbols Intuitively, the annotation should enable schema separation that strips off schema-specific information from natural language questions or substitutes schema-specific information with symbols. However, replacing the mentions of columns with uni-fied symbols like or
discards the diverse semantic meaning of the column texts. Therefore we propose to append the symbols into stacks rather than substituting them to leverage the semantics of column texts. This proposed approach is referred to as “column stack” in Experiment Section. Figure 4 shows the difference of the two approaches and highlights that the proposed approach provides more information to the downstream sequence model.
Annotated Question: When [Piotr Adamczy] was nominated as
[Best Actor in a Leading Role] ? Annotated SQL SELECT Nomination Date WHERE
Annotated Question with table header encoding: When Adamczy] was nominated as
[Best Actor in a Leading Role] ? |
[Nomination]
[Film Name]
[Director]
[Nomination Date] Annotated SQL with table header encoding SELECT
Figure 5: An example showing approaches with and without table header encoding.
Table Header When a column in SQL is not mentioned in the question, we can only rely on the sequence model to infer the column. For the example in Figure 5 , “Nomination Date” is not explicitly mentioned and needs to be inferred by the model. However, most of the columns (e.g., “Nomination Date”) consist of multiple tokens, which are hard to be inferred correctly by a sequence model. To encourage the correct inference of multi-token columns, we append all headers in the end of the annotated question, so that even if a column is not mentioned in and cannot be inferred only from the question, it could still be referred to as
by the sequence model.
” is appended to the annotated question. and thus simplifies the annotated SQL to the form of “
”. Besides referring to unmentioned headers, our strategy also contributes to a much smaller output vocabulary and thus makes it easier to train a sequence model.
Sequence Translation Model We train a seq2seq model to estimate , which captures the conditional probability of
=
with encoder and decoder.
Encoder is implemented as stacked bi-directional multilayer recurrent neural network (RNN) with using Gated Recurrent Unit (GRU) Cho et al. (2014). To keep the dimension consistent, we add an affine transformation before each layer of RNN, which is defined as , where
is the input of the l-th layer in the i-th position.
and
are model parameters. The hidden state of the forward RNN and backward RNN are computed as:
We concatenate forward state vector and backward state vector as . The input of each layer is computed as
is the word embedding lookup function)
.
Decoder is an one-layer attentive RNN with copy mechanism. We use Bahdanau’s attention (Bahdanau, Cho, and Bengio, 2014) as follows: At each time step i in the decoder, the decoding step is defined as:
where are model parameters,
the hidden states of the decoder RNN and j the index enumerating all positions in the encoder. In NLIDB, annotations in the output SQL often correspond directly from the input natural language question, To encourage the model choose tokens that appears in the input. We introduce copy mechanism that samples output token
as
Note that this is different from the vanilla copy mechanism where the output is sampled through softmax over entire word vocabulary as .
Natural Language Interface to Database (NLIDB) Natural Language Interface to Database aims at providing an interactive bridge between humans and machines, where human users can issue a question in natural language text, which would then be translated to a structured query that can be executed by a database engine. Androutsopoulos, Ritchie, and Thanisch (1995) first explores this task with concrete examples defining this problem and highlights the separation between linguistic and database-derived information. Later Popescu, Etzioni, and Kautz (2003) proposes to identify questions whose answers are tractable solely from the database, and Giordani and Moschitti (2012) incorporates tree kernels in ranking candidate queries. Many recent advances can be categorized into two groups. The first group uses semantic parsing Wang, Berant, and Liang (2015); Pa- supat and Liang (2015); Jia and Liang (2016) as well as some extensions that support cross-domain semantic parsing Herzig and Berant (2017); Su and Yan (2017). However, due to the idiosyncrasy and complexity of natural languages, most works in this group are confined in narrow domains. The other group relies on neural based methods where sequence to sequence models are leveraged to translate input questions to SQL queries, optional combined with the help of user feedback Iyer et al. (2017) and reinforcement learning Zhong, Xiong, and Socher (2017).
Slot Filling in Dialogue System Dialogue system aims at communicating with a user in a session with multiple turns of dialogs, where state, or conceptually what the session is talking about, needs to be tracked for dialogue system to archive good performance Young et al. (2010). Commonly the dialogue system identifies and tracks entities that appear across turns as slots in a process called slot filling. These slots and entities that can fill in these slots are usually specific to the domain that the dialogue system is focusing on. For example, slots can be food, airport or city names, and therefore are from a pre-defined, externally crafted list of possible values. Recently, neural based approaches have been proposed for tracking state: Henderson, Thomson, and Young (2014) proposes a simple recurrent network that predicts the probability of each word in dialog being one of pre-defined slots, which is extended by Mrkši´c et al. (2015), a hierarchical model that can handle cases where entities can be from one of multiple domains. To specifically provide better tracking for ranking slot values in dialog, Belief Tracker Mrkši´c et al. (2017) sums up separating representations of system output, user feedback and candidate slot values. To further improve the performance, Wen et al. (2017) considers a policy network that arbitrates the outputs from multiple models, including the aforementioned belief tracker, a sequence model that encodes user input, and a generation network that produces system output.
Closest to our proposed work is Xu, Liu, and Song (2017), which employs a sketch-based approach that represents an SQL as a template with slots, and the model predicts values from a limited candidate set to be filled in each slot. This is different from our work that focuses on annotation and does not restrict SQL to a particular template-based form. Another close work is TypeSQL Yu et al. (2018) that enriches the inference of columns and values using domain-specific knowledge-based model that searches five types of entities on Freebase, an extra large database, which is contrast to our work which does not rely on extra database knowledge.
Table 1: Comparison of models. lf, qm, ex represent logical forms, exact query match, and query execution accuracy respectively. Performance on the first block are copied from the corresponding papers. “–” and “+” mean removing or adding one component from our best approach respectively for ablation.
* We report performance without using extra knowledge base for a fair comparison
Sequence-to-sequence Generation model Sequence to Sequence (referred to as seq2seq in the rest of the paper) learning Sutskever, Vinyals, and Le (2014) has led to many advances in neural semantic parsing models. Notable advances in sequence learning include attention mechanism Bahdanau, Cho, and Bengio (2014) and pointer network Vinyals, Fortunato, and Jaitly (2015) that boost performance for sequence learning and enable it to handle long sequence and rare words. They have seen successful applications on language model Merity et al. (2016), text summarization Gu et al. (2016), text understanding Wen et al. (2017), and neural computing Graves et al. (2016). Our model also benefits from these techniques since our model needs to see both information packed in long sequence and rare words that only appear in few tables.
We conduct experiments1 on two scenarios: (1) in-domain scenario of NLIDB trained and evaluated on WikiSQL dataset Zhong, Xiong, and Socher (2017), and (2) cross-domain scenario where we evaluate our model (trained on WikiSQL)’s transferring accuracy on OVERNIGHT dataset (Wang and Yang, 2015) without extra training in the target domain. We use three metrics for evaluating the query synthesis accuracy: (1) Logical-form accuracy. We compare the synthesized SQL query against the ground truth for whether they agree token-by-token in their logical form, as proposed in Zhong, Xiong, and Socher (2017). (2) Query-match accuracy. Like logical-form accuracy, except that we convert both synthesized SQL query and the ground truth into canonical representations before comparison. This metric can eliminate false negatives cases, such as two semantically identical SQL queries being different only in the literal ordering of condition clauses. (3) Execution accuracy. We execute both the synthesized query and the ground truth query and compare whether the results agree, as proposed in Zhong, Xiong, and Socher (2017).
WikiSQL
We train and evaluate our NLIDB model on WikiSQL, which contains 87673 records of natural language questions, SQL queries, and 26521 database tables. Since tables are not shared among the training / validating / testing splitting provided in the dataset, models evaluated on it are supposed to generalize to new questions and database schema.
Training Details The annotation process uses and
. For the sequence model, we use two layers of stacked GRU with hidden size 200 for encoder, and 400 for decoder. The input and output layers of both encoder and decoder share the tied embedding weights. We initialize the embedding weights with pre-trained Glove embedding with dimension D = 300, and embeddings for tokens not covered by GloVe with random vector. Symbols introduced by annotation (such as
) are also treated as tokens, each of them being represented by the concatenation of the embedding of annotation type (c or v) and index. Also, the embedding of annotation type and index are randomly initialized with
so the concatenations has the same dimension as D = 300. In training we use gradient clipping with threshold 5.0, and in inference we use beam search with width 5.
Evaluation We compare our method with previous methods through three aforementioned metrics: accuracies in terms of logical form exact match, exact query match, and the results of query execution. As shown in Table 1, our result outperforms these previous methods, including the state-of-the-art Coarse2Fin Dong and Lapata (2018) by 0.3% for exact query match, and 3.7% for query execution. This demonstrates that our method enables high transfer-ability to unseen tables, since in WikiSQL database tables are not shared among training and testing splittings.
We note that TypeSQL achieves high accuracy in the "content-sensitive" setting where it queries Freebase when handling natural language questions in training as well as in inferencing. TypeSQL is thus limited because it cannot easily generalize to content that is not covered by Freebase. Since our focus is automated process with generalization, we did not take this setting into consideration.
Table 2: OVERNIGHT transfer accuracy. “–” means removing one component from our best approach respectively for ablation.
Ablation In Table 1, we conduct ablation analysis to demonstrate the contribution to performance from different components of our model. Removing each component of our method leads to a decreases in performance: The removal of (1) a layer in encoder (e.g. using only one layer of GRU), (2) copy mechanism, (3) column stack (e.g. using column substitution ), and (4) encoding of table header, each respectively decrease performance on testing set.
Since in our framework the annotation and sequence modeling are separate processes, we also test our annotation method combined with the transformer model 2, an alternative and state-of-the-art architecture for sequence modeling such as machine translation. With the same annotation, the transformer model shows worse performance. We hypothesis that this is due to NLIDB task being different from translation tasks: NLIDB has huge difference between vocabulary sizes in source space and target space, and only outputs the symbols while learning original text implicitly.
Cross-domain transfer-ability
For cross-domain evaluation, we evaluate the transfer-ability of our model that is trained on one domain (WikiSQL in our case) and tested on other domains to assess its transfer-ability. This task is challenging since the model is required to model domains not seen before.
Dataset We use OVERNIGHT dataset (Wang and Yang, 2015), consisting of pairs of natural language query and dataset-specific logical form, as target domains. In particular, the query and logical form pairs are originally categorized into eight sub-domains, of which we use five of them (BASKETBALL, CALENDAR, HOUSING, RECIPES, and RESTAURANTS) where the logical form can be feasibly converted to SQL for evaluation. We then annotate these five sub-domains of OVERNIGHT dataset.
Evaluation We evaluate the transfer-ability of our model that is trained on WikiSQL on five sub-domains of OVERNIGHT dataset. Since OVERNIGHT SQL sketch is highly variant and different from WikiSQL sketch. We make a reasonable assumption that only SQL sketch compatible with WikiSQL are considered in the transfer-ability evaluation. Table 2 presents our transfer-ability performance,
2We use pre-trained GloVe embedding, transformer implementation from tensor2tensor https://github.com/
where Pruning Rate represents the percentage of sketch noncompatible records in each category, and number of records represents the number of sketch compatible records in each category, which is the size of our evaluation set. Transfer accuracy is calculated over sketch compatible records and based on recovered SQL. It is shown that our model performs high transfer-ability with zero-shot learning on OVERNIGHT dataset.
Ablation We conduct ablation analysis to demonstrate the transfer-ability contributed by our annotation strategies as well as different components of our model as shown in Table 2. When we remove column stack, the model not only performs worse for WikiSQL (see Table 1), but also transfers poorly to OVERNIGHT. The overall query match accuracy is reduced by 22%. In contrast to the results shown in Table 1, encoding table headers hurts transfer ability. The reason is that, in OVERNIGHT dataset, only five sub-domains are considered, so we use pre-collected paraphrase set to annotate columns. Most of the records are fully annotated, and encoding table headers does not enhance annotation coverage. On the contrary, it feeds the model redundant information which hurts the overall accuracy.
We also conduct ablation analysis to demonstrate the transfer-ability contributed by different component of our model. As shown in Table 2, the removal of 1 GRU layer results in 1% overall accuracy decrease. The model using only one layer of GRU is not capable of capturing all the conditional probabilities, thus causing lower transfer-ability.
Counter-intuitively, the model without copy mechanism has 1.5% higher transfer accuracy than the original model. Copy mechanism contributes to the test accuracy on WikiSQL but hurts the transfer-ability on OVERNIGHT. We hypothesis that transfer-ability is not fully correlated with the accuracy of the original model, it also depends on schema extraction and how the model generalizes to another domain.
In this work, we propose an NLIDB as a general paradigm to convert natural language queries to SQL queries for any database. The main contribution of our work is to separate meta data from the data itself and learn, transfer, and accumulate knowledge of natural language and domain-specific knowledge separately. Our extensive experimental analysis demonstrates the advantage of our approach over state-of-the-art approaches in standard datasets.
Androutsopoulos, I.; Ritchie, G. D.; and Thanisch, P. 1995. Nat- ural language interfaces to databases–an introduction. Natural language engineering 1(01):29–81.
Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Dong, L., and Lapata, M. 2018. Coarse-to-fine decoding for neural semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.
Esfandiari, N.; Babavalian, M. R.; Moghadam, A.-M. E.; and Tabar, V. K. 2014. Knowledge discovery in medicine: Current issue and future trend. Expert Systems with Applications 41(9):4434–4463.
Giordani, A., and Moschitti, A. 2012. Translating questions to sql queries with generative parsers discriminatively reranked. Proceedings of COLING 2012: Posters 401–410.
Graves, A.; Wayne, G.; Reynolds, M.; Harley, T.; Danihelka, I.; Grabska-Barwi´nska, A.; Colmenarejo, S. G.; Grefenstette, E.; Ramalho, T.; Agapiou, J.; et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471.
Gu, J.; Lu, Z.; Li, H.; and Li, V. O. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 1631–1640.
Henderson, M.; Thomson, B.; and Young, S. 2014. Word-based di- alog state tracking with recurrent neural networks. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 292–299.
Herzig, J., and Berant, J. 2017. Neural semantic parsing over mul- tiple knowledge-bases. arXiv preprint arXiv:1702.01569.
Huang, P.-S.; Wang, C.; Singh, R.; Yih, W.-t.; and He, X. 2018. Natural language to structured query generation via metalearning. In Proceedings of NAACL HLT 2018.
Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; and Zettle- moyer, L. 2017. Learning a neural semantic parser from user feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 963–973.
Jia, R., and Liang, P. 2016. Data recombination for neural semantic parsing.
Kuhn, H. W. 1955. The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2):83–97.
Merity, S.; Xiong, C.; Bradbury, J.; and Socher, R. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843.
Mrkši´c, N.; Séaghdha, D. Ó.; Thomson, B.; Gasic, M.; Su, P.-H.; Vandyke, D.; Wen, T.-H.; and Young, S. 2015. Multi-domain dialog state tracking using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), volume 2, 794–799.
Mrkši´c, N.; Séaghdha, D. Ó.; Wen, T.-H.; Thomson, B.; and Young, S. 2017. Neural belief tracker: Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 1777–1788.
Ngai, E.; Hu, Y.; Wong, Y.; Chen, Y.; and Sun, X. 2011. The appli- cation of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems 50(3):559–569.
Ngai, E. W.; Xiu, L.; and Chau, D. C. 2009. Application of data mining techniques in customer relationship management: A literature review and classification. Expert systems with applications 36(2):2592–2602.
Pasupat, P., and Liang, P. 2015. Compositional semantic parsing on semi-structured tables. In ACL.
Popescu, A.-M.; Etzioni, O.; and Kautz, H. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th international conference on Intelligent user interfaces, 149–157. ACM.
Su, Y., and Yan, X. 2017. Cross-domain semantic parsing via paraphrasing. arXiv preprint arXiv:1704.05974.
Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to se- quence learning with neural networks. In Advances in neural information processing systems, 3104–3112.
Vinyals, O.; Fortunato, M.; and Jaitly, N. 2015. Pointer networks. In Advances in Neural Information Processing Systems, 2692– 2700.
Wang, W. Y., and Yang, D. 2015. That’s so annoying!!!: A lexi- cal and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2557– 2563. Lisbon, Portugal: Association for Computational Linguistics.
Wang, Y.; Berant, J.; and Liang, P. 2015. Building a semantic parser overnight. In ACL.
Wen, T.-H.; Vandyke, D.; Mrkši´c, N.; Gasic, M.; Barahona, L. M. R.; Su, P.-H.; Ultes, S.; and Young, S. 2017. A networkbased end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, volume 1, 438–449.
Xu, X.; Liu, C.; and Song, D. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.
Young, S.; Gaši´c, M.; Keizer, S.; Mairesse, F.; Schatzmann, J.; Thomson, B.; and Yu, K. 2010. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech & Language 24(2):150–174.
Yu, T.; Li, Z.; Zhang, Z.; Zhang, R.; and Radev, D. 2018. Type- sql: Knowledge-based type-aware neural text-to-sql generation. arXiv preprint arXiv:1804.09769.
Zhong, V.; Xiong, C.; and Socher, R. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.