35:[["$","audio",null,{"id":"tts"}],["$","$L3a",null,{"paperID":"1908.06605","publisher":"arxiv","paperJSON":{"title":"Long and Diverse Text Generation with Planning-based Hierarchical Variational Model","paperID":"1908.06605","avgLineHeight":13.55,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization. ","element":"span"},{"text":"Experiments show that our model outperforms state-of-the-art baselines in long and diverse text generation.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Data-to-text generation is to generate natural language texts from structured data (","element":"span"},{"href":"#id-0","referenceIndex":12,"text":"Gatt and Krah- ","element":"a"},{"href":"#id-0","referenceIndex":12,"text":"mer","element":"a"},{"text":", ","element":"span"},{"href":"#id-0","referenceIndex":12,"text":"2018","element":"a"},{"text":"), which has a wide range of applications (for weather forecast, game report, product description, advertising document, etc.). Most neural methods focus on devising encoding scheme and attention mechanism, namely, (1) exploiting input structure to learn better representation of input data (","element":"span"},{"href":"#id-1","referenceIndex":21,"text":"Lebret et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-1","referenceIndex":21,"text":"2016","element":"a"},{"text":"; ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"Liu et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"2018","element":"a"},{"text":"), and (2) devising attention mechanisms to better employ input data (","element":"span"},{"href":"#id-3","referenceIndex":30,"text":"Mei et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-3","referenceIndex":30,"text":"2016","element":"a"},{"text":"; ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"Liu et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"2018","element":"a"},{"text":"; ","element":"span"},{"href":"#id-4","referenceIndex":32,"text":"Nema et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-4","referenceIndex":32,"text":"2018","element":"a"},{"text":") or to dynamically trace which part of input has been covered in generation (","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"Kid- ","element":"a"},{"href":"#id-5","referenceIndex":16,"text":"don et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"2016","element":"a"},{"text":"). These models are able to produce fluent and coherent short texts in some applications.","element":"span"}],[{"text":"However, to generate long and diverse texts such as product descriptions, existing methods are still unable to capture the complex semantic structures and diversified surface forms of long texts. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"First","element":"span"},{"text":", existing methods are not good at modeling input data dynamically during generation. Some neural methods (","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"Kiddon et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"2016","element":"a"},{"text":"; ","element":"span"},{"href":"#id-6","referenceIndex":10,"text":"Feng et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-6","referenceIndex":10,"text":"2018","element":"a"},{"text":") propose to record the accumulated attention devoted to each input item. However, these records may accumulate errors in representing the state of already generated prefix, thus leading to wrong new attention weights. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Second","element":"span"},{"text":", inter-sentence coherence in long text generation is not well captured (","element":"span"},{"href":"#id-7","referenceIndex":47,"text":"Wiseman et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-7","referenceIndex":47,"text":"2017","element":"a"},{"text":") due to the lack of high-level planning. Recent studies propose to model planning but still have much space for improvement. For instance, in (","element":"span"},{"href":"#id-8","referenceIndex":37,"text":"Pudup- ","element":"a"},{"href":"#id-8","referenceIndex":37,"text":"pully et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-8","referenceIndex":37,"text":"2019","element":"a"},{"text":") and (","element":"span"},{"href":"#id-9","referenceIndex":40,"text":"Sha et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-9","referenceIndex":40,"text":"2018","element":"a"},{"text":"), planning is merely designed for ordering input items, which is limited to aligning input data with the text to be generated. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Third","element":"span"},{"text":", most methods fail to generate diversified expressions. Existing data-to-text methods inject variations at the conditional output distribution, which is proved to capture only low-level variations of expressions (","element":"span"},{"href":"#id-10","referenceIndex":39,"text":"Serban et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-10","referenceIndex":39,"text":"2017","element":"a"},{"text":").","element":"span"}],[{"text":"To address the above issues, we propose a novel Planning-based Hierarchical Variational Model (PHVM). To better model input data and alleviate the inter-sentence incoherence problem, we design a novel planning mechanism and adopt a compatible hierarchical generation process, which mimics the process of human writing. Generally speaking, to write a long text, a human writer first arranges contents and discourse structure (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"high-level planning","element":"span"},{"text":") and then realizes the surface form of each individual part (","element":"span"},{"style":{"fontStyle":"italic"},"text":"low-level realization","element":"span"},{"text":"). Motivated by this, our proposed model first performs","element":"span"}],[{"id":"id-11","style":{"width":"94%"},"width":828,"height":747,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/1-0.png","element":"img"}],[{"text":"Figure 1: Generation process of PHVM. After encoding a list of input attribute-value pairs, PHVM first conducts planning by generating a sequence of groups, each of which is a subset of input items. Each sentence is then realized conditioned on the corresponding group and its previous generated sentences.","element":"figcaption","subtype":"caption"}],[{"text":"planning by segmenting input data into a sequence of groups, and then generates a sentence conditioned on the corresponding group and preceding generated sentences. In this way, we decompose long text generation into a sequence of dependent sentence generation sub-tasks where each sub-task depends specifically on an individual group and the previous context. By this means, the input data can be well modeled and inter-sentence coherence can be captured. Figure ","element":"span"},{"href":"#id-11","text":"1 ","element":"a"},{"text":"depicts the process.","element":"span"}],[{"text":"To deal with expression diversity, this model also enables us to inject variations at both high-level planning and low-level realization with a hierarchical latent structure. At high level, we introduce a global planning latent variable to model the diversity of reasonable planning. At low level, we introduce local latent variables for sentence realization. ","element":"span"},{"text":"Since our model is based on Conditional Variational Auto-Encoder (CVAE) (","element":"span"},{"href":"#id-12","referenceIndex":43,"text":"Sohn ","element":"a"},{"href":"#id-12","referenceIndex":43,"text":"et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-12","referenceIndex":43,"text":"2015","element":"a"},{"text":"), expression diversity can be captured by the global and local latent variables.","element":"span"}],[{"text":"We evaluate the model on a new advertising text","element":"span"},{"text":"1 ","element":"span"},{"text":"generation task which requires the system to generate a long and diverse advertising text that covers a given set of attribute-value pairs describing a product (see Figure ","element":"span"},{"href":"#id-11","text":"1","element":"a"},{"text":"). ","element":"span"},{"text":"We also evaluate our model on the recipe text generation task from (","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"Kiddon et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":16,"text":"2016","element":"a"},{"text":") which requires the system to correctly use the given ingredients and maintain coherence among cooking steps. Experiments on advertising text generation show that our model outperforms state-of-the-art baselines in automatic and manual evaluation. Our model also generalizes well to long recipe text generation and outperforms the baselines. Our contributions are twofold:","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"We design a novel Planning-based Hierarchical Variational Model (PHVM) which integrates planning into a hierarchical latent structure. Experiments show its effectiveness in coverage, coherence, and diversity.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"We propose a novel planning mechanism which segments the input data into a sequence of groups, thereby decomposing long text generation into dependent sentence generation sub-tasks. ","element":"span"},{"text":"Thus, input data can be better modeled and inter-sentence coherence can be better captured. To capture expression diversity, we devise a hierarchical latent structure which injects variations at both high-level planning and low-level realization.","element":"span"}]]},{"heading":"2 Related Work","paragraphs":[[{"text":"Traditional methods (","element":"span"},{"href":"#id-13","referenceIndex":38,"text":"Reiter and Dale","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":38,"text":"1997","element":"a"},{"text":"; ","element":"span"},{"href":"#id-14","referenceIndex":44,"text":"Stent ","element":"a"},{"href":"#id-14","referenceIndex":44,"text":"et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":44,"text":"2004","element":"a"},{"text":") for data-to-text generation consist of three components: content planning, sentence planning, and surface realization. Content planning and sentence planning are responsible for what to say and how to say respectively; they are typically based on hand-crafted ","element":"span"},{"text":"(","element":"span"},{"href":"#id-15","referenceIndex":20,"text":"Kukich","element":"a"},{"text":", ","element":"span"},{"href":"#id-15","referenceIndex":20,"text":"1983","element":"a"},{"text":"; ","element":"span"},{"href":"#id-16","referenceIndex":6,"text":"Dalianis and Hovy","element":"a"},{"text":", ","element":"span"},{"href":"#id-16","referenceIndex":6,"text":"1993","element":"a"},{"text":"; ","element":"span"},{"href":"#id-17","referenceIndex":15,"text":"Hovy","element":"a"},{"text":", ","element":"span"},{"href":"#id-17","referenceIndex":15,"text":"1993","element":"a"},{"text":") or automatically-learnt rules ","element":"span"},{"text":"(","element":"span"},{"href":"#id-18","referenceIndex":8,"text":"Duboue and McKe- ","element":"a"},{"href":"#id-18","referenceIndex":8,"text":"own","element":"a"},{"text":", ","element":"span"},{"href":"#id-18","referenceIndex":8,"text":"2003","element":"a"},{"text":"). Surface realization generates natural language by carrying out the plan, which is template-based (","element":"span"},{"href":"#id-19","referenceIndex":29,"text":"McRoy et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-19","referenceIndex":29,"text":"2003","element":"a"},{"text":"; ","element":"span"},{"href":"#id-20","referenceIndex":7,"text":"van Deemter ","element":"a"},{"href":"#id-20","referenceIndex":7,"text":"et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-20","referenceIndex":7,"text":"2005","element":"a"},{"text":") or grammar-based (","element":"span"},{"href":"#id-21","referenceIndex":2,"text":"Bateman","element":"a"},{"text":", ","element":"span"},{"href":"#id-21","referenceIndex":2,"text":"1997","element":"a"},{"text":"; ","element":"span"},{"href":"#id-22","referenceIndex":9,"text":"Espinosa et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-22","referenceIndex":9,"text":"2008","element":"a"},{"text":"). As these models are shallow and the two stages (planning and realization) often function separately, traditional methods are unable to capture rich variations of texts.","element":"span"}],[{"text":"Recently, neural methods have become the mainstream models for data-to-text generation due to their strong ability of representation learning and scalability. These methods perform well in generating weather forecasts (","element":"span"},{"href":"#id-3","referenceIndex":30,"text":"Mei et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-3","referenceIndex":30,"text":"2016","element":"a"},{"text":") or very short biographies (","element":"span"},{"href":"#id-1","referenceIndex":21,"text":"Lebret et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-1","referenceIndex":21,"text":"2016","element":"a"},{"text":"; ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"Liu","element":"a"}],[{"id":"id-36","style":{"width":"91%"},"width":1658,"height":439,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-0.png","element":"img"}],[{"text":"Figure 2: Architecture of PHVM. The model controls planning with a global latent variable ","element":"figcaption","subtype":"caption"},{"style":{"height":10.58},"width":37.28,"height":26.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-1.png","element":"img","alt":" zp","inline":true},{"text":". The plan decoder conducts planning by generating a sequence of groups ","element":"figcaption","subtype":"caption"},{"style":{"height":14.4},"width":398.35,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-2.png","element":"img","alt":" g = g1g2...gT where gt","inline":true,"padRight":true},{"text":"is a subset of input items and specifies the content of sentence ","element":"figcaption","subtype":"caption"},{"style":{"height":9.19},"width":30.68,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-3.png","element":"img","alt":" st","inline":true,"padRight":true},{"text":"to be generated. The sentence decoder controls the realization of ","element":"figcaption","subtype":"caption"},{"style":{"height":9.19},"width":30.68,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-4.png","element":"img","alt":" st","inline":true,"padRight":true},{"text":"with a local latent variable ","element":"figcaption","subtype":"caption"},{"style":{"height":14.52},"width":35.28,"height":36.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-5.png","element":"img","alt":" zst ","inline":true,"padRight":true},{"text":"; dependencies among ","element":"figcaption","subtype":"caption"},{"style":{"height":14.52},"width":35.29,"height":36.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-6.png","element":"img","alt":" zst ","inline":true,"padRight":true},{"text":"are explicitly modeled to better capture inter-sentence coherence.","element":"figcaption","subtype":"caption"}],[{"href":"#id-2","referenceIndex":25,"text":"et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-2","referenceIndex":25,"text":"2018","element":"a"},{"text":"; ","element":"span"},{"href":"#id-9","referenceIndex":40,"text":"Sha et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-9","referenceIndex":40,"text":"2018","element":"a"},{"text":"; ","element":"span"},{"href":"#id-4","referenceIndex":32,"text":"Nema et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-4","referenceIndex":32,"text":"2018","element":"a"},{"text":") using well-designed data encoder and attention mechanisms. However, as demonstrated in ","element":"span"},{"href":"#id-7","referenceIndex":47,"text":"Wise- ","element":"a"},{"href":"#id-7","referenceIndex":47,"text":"man et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-7","referenceIndex":47,"text":"2017","element":"a"},{"text":") (a game report generation task), existing neural methods are still problematic for long text generation: they often generate incoherent texts. In fact, these methods also lack the ability to model diversity of expressions.","element":"span"}],[{"text":"As for long text generation, recent studies tackle the incoherence problem from different perspectives. To keep the decoder aware of the crucial information in the already generated prefix, ","element":"span"},{"href":"#id-23","referenceIndex":41,"text":"Shao ","element":"a"},{"href":"#id-23","referenceIndex":41,"text":"et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-23","referenceIndex":41,"text":"2017","element":"a"},{"text":") appended the generated prefix to the encoder, and ","element":"span"},{"href":"#id-24","referenceIndex":14,"text":"Guo et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-24","referenceIndex":14,"text":"2018","element":"a"},{"text":") leaked the extracted features of the generated prefix from the discriminator to the generator in a Generative Adversarial Nets (","element":"span"},{"href":"#id-25","referenceIndex":13,"text":"Goodfellow et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-25","referenceIndex":13,"text":"2014","element":"a"},{"text":"). ","element":"span"},{"text":"To model dependencies among sentences, ","element":"span"},{"href":"#id-26","referenceIndex":23,"text":"Li et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-26","referenceIndex":23,"text":"2015","element":"a"},{"text":") utilized a hierarchical recurrent neural network (RNN) decoder. ","element":"span"},{"href":"#id-27","referenceIndex":19,"text":"Konstas and Lapata ","element":"a"},{"text":"(","element":"span"},{"href":"#id-27","referenceIndex":19,"text":"2013","element":"a"},{"text":") proposed to plan content organization with grammar rules while ","element":"span"},{"href":"#id-8","referenceIndex":37,"text":"Puduppully et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-8","referenceIndex":37,"text":"2019","element":"a"},{"text":") planned by reordering input data. Most recently, ","element":"span"},{"href":"#id-28","referenceIndex":31,"text":"Moryossef ","element":"a"},{"href":"#id-28","referenceIndex":31,"text":"et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-28","referenceIndex":31,"text":"2019","element":"a"},{"text":") proposed to select plans from all possible ones, which is infeasible for large inputs.","element":"span"}],[{"text":"As for diverse text generation, existing methods can be divided into three categories: enriching conditions (","element":"span"},{"href":"#id-29","referenceIndex":48,"text":"Xing et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-29","referenceIndex":48,"text":"2017","element":"a"},{"text":"), post-processing with beam search and rerank (","element":"span"},{"href":"#id-30","referenceIndex":22,"text":"Li et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-30","referenceIndex":22,"text":"2016","element":"a"},{"text":"), and designing effective models (","element":"span"},{"href":"#id-31","referenceIndex":49,"text":"Xu et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-31","referenceIndex":49,"text":"2018","element":"a"},{"text":"). Some text-to-text generation models (","element":"span"},{"href":"#id-10","referenceIndex":39,"text":"Serban et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-10","referenceIndex":39,"text":"2017","element":"a"},{"text":"; ","element":"span"},{"href":"#id-32","referenceIndex":50,"text":"Zhao et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-32","referenceIndex":50,"text":"2017","element":"a"},{"text":") inject high-level variations with latent variables. Variational Hierarchical Conversation RNN (VHCR) (","element":"span"},{"href":"#id-33","referenceIndex":36,"text":"Park et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-33","referenceIndex":36,"text":"2018","element":"a"},{"text":") is a most similar model to ours, which also adopts a hierarchical latent structure. Our method differs from VHCR in two aspects: (1) VHCR has no planning mechanism, and the global latent variable is mainly designed to address the KL collapse problem, while our global latent variable captures the diversity of reasonable planning; (2) VHCR injects distinct local latent variables without direct dependencies, while our method explicitly models the dependencies among local latent variables to better capture inter-sentence connections. ","element":"span"},{"href":"#id-34","referenceIndex":42,"text":"Shen ","element":"a"},{"href":"#id-34","referenceIndex":42,"text":"et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-34","referenceIndex":42,"text":"2019","element":"a"},{"text":") proposed ml-VAE-D with multi-level latent variables. However, the latent structure of ml-VAE-D consists of two global latent variables: the top-level latent variable is introduced to learn a more flexible prior of the bottom-level latent variable which is then used to decode a whole paragraph. By contrast, our hierarchical latent structure is tailored to our planning mechanism: the top level latent variable controls planning results and a sequence of local latent variables is introduced to obtain fine-grained control of sentence generation sub-tasks.","element":"span"}],[{"text":"We evaluated our model on a new advertising text generation task which is to generate a long and diverse text that covers all given specifica-tions about a product. ","element":"span"},{"text":"Different from our task, the advertising text generation task in (","element":"span"},{"href":"#id-35","referenceIndex":4,"text":"Chen et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-35","referenceIndex":4,"text":"2019","element":"a"},{"text":") is to generate personalized product description based on product title, product aspect (e.g., “appearance”), and user category.","element":"span"}]]},{"heading":"3 Task Deﬁnition","paragraphs":[[{"text":"Given input data ","element":"span"},{"style":{"height":17.6},"width":361.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-7.png","element":"img","alt":" x = {d1, d2, ..., dN}","inline":true,"padRight":true},{"text":"where each ","element":"span"},{"style":{"height":15.02},"width":34.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-8.png","element":"img","alt":"di","inline":true,"padRight":true},{"text":"can be an attribute-value pair or a keyword, our task is to generate a long and diverse text ","element":"span"},{"style":{"height":19.13},"width":513.42,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/2-9.png","element":"img","alt":"y = s1s2...sT (st is the tth ","inline":true,"padRight":true},{"text":"sentence) that covers ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"as much as possible. ","element":"span"},{"text":"For the advertising text generation task, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"consists of specifications about a product where each ","element":"span"},{"style":{"height":15.02},"width":34.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-0.png","element":"img","alt":" di","inline":true,"padRight":true},{"text":"is an attribute-value pair ","element":"span"},{"style":{"height":12.8},"width":206.45,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-1.png","element":"img","alt":" < ai, vi >","inline":true},{"text":". For the recipe text generation task, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is an ingredient list where each ","element":"span"},{"style":{"height":15.02},"width":189.46,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-2.png","element":"img","alt":" di is an in-","inline":true,"padRight":true},{"text":"gredient. Since the recipe title ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"is also used for generation, we abuse the symbol ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"to represent ","element":"span"},{"style":{"height":17.6},"width":408.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-3.png","element":"img","alt":"< {d1, d2, ..., dN}, r >","inline":true,"padRight":true},{"text":"for simplification.","element":"span"}]]},{"heading":"4 Approach","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Overview","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-36","text":"2 ","element":"a"},{"text":"shows the architecture of PHVM. PHVM first samples a global planning latent variable ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-4.png","element":"img","alt":" zp","inline":true,"padRight":true},{"text":"based on the encoded input data; ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-5.png","element":"img","alt":" zp ","inline":true,"padRight":true},{"text":"serves as a condition variable in both planning and hierarchical generation process. ","element":"span"},{"text":"The plan decoder takes ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-6.png","element":"img","alt":"zp ","inline":true,"padRight":true},{"text":"as initial input. ","element":"span"},{"text":"At time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", it decodes a group ","element":"span"},{"style":{"height":12},"width":32.82,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-7.png","element":"img","alt":" gt","inline":true,"padRight":true},{"text":"which is a subset of input items (","element":"span"},{"style":{"height":15.6},"width":51.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-8.png","element":"img","alt":"di)","inline":true,"padRight":true},{"text":"and specifies the content of the ","element":"span"},{"style":{"height":17.75},"width":281.16,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-9.png","element":"img","alt":" tth sentence st.","inline":true,"padRight":true},{"text":"When the plan decoder finishes planning, the hierarchical generation process starts, which involves the high-level sentence decoder and the low-level word decoder. The sentence decoder models inter-sentence coherence in semantic space by computing a sentence representation ","element":"span"},{"style":{"height":16.72},"width":41.14,"height":41.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-10.png","element":"img","alt":" hst ","inline":true,"padRight":true},{"text":"and sampling a ","element":"span"},{"text":"local latent variable ","element":"span"},{"style":{"height":16.25},"width":38.21,"height":40.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-11.png","element":"img","alt":" zst ","inline":true,"padRight":true},{"text":"for each group. ","element":"span"},{"style":{"height":16.72},"width":180.5,"height":41.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-12.png","element":"img","alt":" hst and zst ,","inline":true,"padRight":true},{"text":"along with ","element":"span"},{"style":{"height":12},"width":32.81,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-13.png","element":"img","alt":" gt","inline":true},{"text":", guide the word decoder to realize the corresponding sentence ","element":"span"},{"style":{"height":10.62},"width":45.68,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-14.png","element":"img","alt":" st.","inline":true}],[{"text":"The planning process decomposes the long text generation task into a sequence of dependent sentence generation sub-tasks, thus facilitating the hierarchical generation process. ","element":"span"},{"text":"With the hierarchical latent structure, PHVM is able to capture multi-level variations of texts.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Input Encoding","element":"span"}],[{"text":"We first embed each input item ","element":"span"},{"style":{"height":15.02},"width":34.71,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-15.png","element":"img","alt":" di","inline":true,"padRight":true},{"text":"into vector ","element":"span"},{"style":{"height":17.6},"width":94.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-16.png","element":"img","alt":"e(di)","inline":true},{"text":". The recipe title ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"is also embedded as ","element":"span"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"e","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"text":")","element":"span"},{"text":". We then encode ","element":"span"},{"style":{"height":15.13},"width":40.94,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-17.png","element":"img","alt":" x2 ","inline":true,"padRight":true},{"text":"with a bidirectional Gated Recurrent Unit (GRU) (","element":"span"},{"href":"#id-37","referenceIndex":5,"text":"Cho et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-37","referenceIndex":5,"text":"2014","element":"a"},{"text":"). For advertising text generation, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is represented as the concatenation of the last hidden states of the forward and backward GRU ","element":"span"},{"style":{"height":25.72},"width":417.83,"height":64.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-18.png","element":"img","alt":" enc(x) = [−→hN; ←−h1]; for","inline":true,"padRight":true},{"text":"recipe text generation, ","element":"span"},{"style":{"height":25.72},"width":462.4,"height":64.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-19.png","element":"img","alt":" enc(x) = [−→hN; ←−h1; e(r)].","inline":true},{"style":{"height":25.72},"width":240.45,"height":64.3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-20.png","element":"img","alt":"hi = [−→hi; ←−hi]","inline":true,"padRight":true},{"text":"is the context-aware representation of ","element":"span"},{"style":{"height":15.02},"width":34.72,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-21.png","element":"img","alt":" di","inline":true},{"text":". Note that input encoder is not necessarily an RNN; other neural encoders or even other encoding schemes are also feasible, such as multi-layer perceptron (MLP) and bag of words.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Planning Process","element":"span"}],[{"text":"The planning process generates a subset of input items to be covered for each sentence, thus decomposing long text generation into easier dependent sentence generation sub-tasks. Due to the flexi-bility of language, there may exist more than one reasonable text that covers the same input but in different order. To capture such variety, we model the diversity of reasonable planning with a global planning latent variable ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-22.png","element":"img","alt":" zp","inline":true},{"text":". Different samples of ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-23.png","element":"img","alt":"zp ","inline":true,"padRight":true},{"text":"may lead to different planning results which control the order of content. This process can be formulated as follows:","element":"span"}],[{"id":"id-38","style":{"width":"74%"},"width":654,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-24.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":244.27,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-25.png","element":"img","alt":" g = g1g2...gT","inline":true,"padRight":true},{"text":"is a sequence of groups, and each group ","element":"span"},{"style":{"height":12},"width":32.82,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-26.png","element":"img","alt":" gt","inline":true,"padRight":true},{"text":"is a subset of input items which is a main condition when realizing the sentence ","element":"span"},{"style":{"height":10.62},"width":45.68,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-27.png","element":"img","alt":" st.","inline":true}],[{"text":"The global latent variable ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-28.png","element":"img","alt":" zp ","inline":true,"padRight":true},{"text":"is assumed to follow the isotropic Gaussian distribution, and is sampled from its prior distribution ","element":"span"},{"style":{"height":17.6},"width":216.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-29.png","element":"img","alt":" pθ(zp|x) =","inline":true},{"style":{"height":19.13},"width":220.39,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-30.png","element":"img","alt":"N(µp, σp2I)","inline":true,"padRight":true},{"text":"during inference and from its approximate posterior distribution ","element":"span"},{"style":{"height":18.88},"width":273.16,"height":47.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-31.png","element":"img","alt":" qθ′(zp|x, y) =","inline":true},{"style":{"height":19.13},"width":241.86,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-32.png","element":"img","alt":"N(µp′, σp′2I)","inline":true,"padRight":true},{"text":"during training:","element":"span"}],[{"style":{"width":"80%"},"width":703,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-33.png","element":"img"}],[{"text":"We solve Eq. ","element":"span"},{"href":"#id-38","text":"1 ","element":"a"},{"text":"greedily by computing ","element":"span"},{"style":{"height":12},"width":92.47,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-34.png","element":"img","alt":" gt =","inline":true},{"style":{"height":18.22},"width":465.74,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-35.png","element":"img","alt":"argmaxgtP(gt|g 0.5}","inline":true,"padRight":true},{"text":"(If this is empty, we set ","element":"span"},{"style":{"height":17.99},"width":550.63,"height":44.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-42.png","element":"img","alt":" gt as {argmaxdiP(di ∈ gt)}.).","inline":true}],[{"text":"We feed ","element":"span"},{"style":{"height":17.6},"width":141.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-43.png","element":"img","alt":" bow(gt)","inline":true,"padRight":true},{"text":"(the average pooling of ","element":"span"},{"style":{"height":17.6},"width":223.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-44.png","element":"img","alt":"{hi|di ∈ gt}","inline":true},{"text":") to the plan decoder at the next time step, so that ","element":"span"},{"style":{"height":20.29},"width":80.7,"height":50.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-45.png","element":"img","alt":" hpt+1 ","inline":true,"padRight":true},{"text":"is aware of what data has been ","element":"span"},{"text":"selected and what has not. The planning process proceeds until the probability of stopping at the next time step is over 0.5:","element":"span"}],[{"style":{"width":"73%"},"width":639,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-46.png","element":"img"}],[{"text":"The hidden state is initialized with ","element":"span"},{"style":{"fontStyle":"italic","fontWeight":"bold"},"text":"enc","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":12.33},"width":40.21,"height":30.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/3-47.png","element":"img","alt":"zp","inline":true},{"text":". The plan decoder is trained with full supervision, which is applicable to those tasks where reference plans are available or can be approximated. For both tasks we evaluate in this paper, we approximate the reference plans by recognizing the subset of input items covered by each sentence with string match heuristics. The loss function at time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is given by:","element":"span"}],[{"style":{"width":"91%"},"width":803,"height":302,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":32.81,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-1.png","element":"img","alt":" �gt","inline":true,"padRight":true},{"text":"is the reference group. As a result, ","element":"span"},{"style":{"height":12.4},"width":82.62,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-2.png","element":"img","alt":" zp is","inline":true,"padRight":true},{"text":"forced to capture features of reasonable planning.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Hierarchical Generation Process","element":"span"}],[{"text":"The generation process produces a long text ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"= ","element":"span"},{"style":{"height":10.7},"width":159.58,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-3.png","element":"img","alt":"s1s2...sT","inline":true,"padRight":true},{"text":"in alignment with the planning result ","element":"span"},{"style":{"height":12},"width":241.22,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-4.png","element":"img","alt":"g = g1g2...gT","inline":true,"padRight":true},{"text":", which is formulated as follows:","element":"span"}],[{"id":"id-39","style":{"width":"73%"},"width":642,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-5.png","element":"img"}],[{"text":"We perform sentence-by-sentence generation and solve Eq. ","element":"span"},{"href":"#id-39","text":"8 ","element":"a"},{"text":"greedily by computing ","element":"span"},{"style":{"height":10.62},"width":98.61,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-6.png","element":"img","alt":" st =","inline":true},{"style":{"height":17.6},"width":510,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/4-7.png","element":"img","alt":"argmaxstP(st|s ","element":"span"},{"text":"in ROTOWIRE, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dior ","element":"span"},{"text":"was replaced with the tag ","element":"span"},{"style":{"fontStyle":"italic"},"text":"<","element":"span"},{"style":{"fontStyle":"italic"},"text":"BRAND","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"in our dataset. We then computed distinct-4 on 3,000 randomly sampled texts from our dataset and on samples with comparable number of words from E2E, WebNLG, WIKIBIO, and ROTOWIRE respectively. ","element":"span"},{"text":"Our dataset exhibits much higher diversity with a substantially higher distinct-4 score than other corpora. As the texts in our dataset are long (the average length is about 110 words) and diverse, our dataset is suitable to evaluate long and diverse text generation in this paper.","element":"span"}],[{"text":"Some other datasets pair structured data with user-generated reviews, ","element":"span"},{"text":"such as Amazon reviews (","element":"span"},{"href":"#id-62","referenceIndex":27,"text":"McAuley et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-62","referenceIndex":27,"text":"2015","element":"a"},{"text":"), Yelp dataset","element":"span"},{"text":"5","element":"span"},{"text":", and IMDb dataset (","element":"span"},{"href":"#id-63","referenceIndex":26,"text":"Maas et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-63","referenceIndex":26,"text":"2011","element":"a"},{"text":"). ","element":"span"},{"text":"We did not use such corpora because the contents of user-generated reviews do not mainly come from the data but commonly depend on many other things such as the reviewers’ experience and preference.","element":"span"}]]},{"heading":"B Case Study","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"B.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Advertising Text Generation","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-64","text":"5 ","element":"a"},{"text":"shows generated texts from different models given the same input.","element":"span"}],[{"text":"Most baselines fail to cover all the provided data and repeatedly describe some of the input items. For example, the text from Link-S2S ignores the attribute value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"three-quarter sleeve ","element":"span"},{"text":"and describes the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"round collar ","element":"span"},{"text":"twice. Checklist and CVAE also have similar problems. As Link-S2S and Checklist inject variations only at the conditional output distribution, they suffer from the redundancy problem. Though Pointer-S2S covers all attribute values without redundancy, it introduces logical incoherence (the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"round collar ","element":"span"},{"text":"can not ","element":"span"},{"style":{"fontStyle":"italic"},"text":"reveal slender","element":"span"}],[{"style":{"width":"64%"},"width":1166,"height":249,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/13-0.png","element":"img"}],[{"id":"id-58","text":"Table 6: Statistics of E2E, WebNLG, WEATHERGOV (WG), WIKIBIO (WB), ROTOWIRE (RW) and our dataset. ","element":"figcaption","subtype":"caption"},{"text":"We computed distinct-4 (see section 5.3) on 3,000 randomly sampled advertising texts for our dataset and on samples with comparable number of words for E2E, WebNLG, WB and RW, respectively.","element":"figcaption","subtype":"caption"}],[{"style":{"fontStyle":"italic"},"text":"arms","element":"span"},{"text":") in the first sentence. By contrast, both texts generated by our model cover all the input data without redundancy.","element":"span"}],[{"text":"Due to diverse yet reasonable planning, the two texts of our model exhibit different discourse structures. ","element":"span"},{"text":"The first text adopts a general-to-specific discourse structure where the statement in the beginning (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"the elegance of the dress","element":"span"},{"text":") is supported by the following descriptions of local features. It groups global features (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"color","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"material ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"length","element":"span"},{"text":") from the input in the first sentence and realizes each of the remaining sentences with one local feature. The second text adopts a parallel structure which splits global features and arranges some of them in the middle. Despite the difference, the two texts exhibit a global pattern in the data. They both describe the dress from top to bottom (i.e., ","element":"span"},{"style":{"fontStyle":"italic"},"text":"collar ","element":"span"},{"style":{"height":16.4},"width":486.56,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/13-1.png","element":"img","alt":" − > sleeve − > shape of","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the lower part","element":"span"},{"text":"), which verifies the effectiveness of content organization. Noticeably, the two texts show diverse wording, which exemplifies that our model captures the diversity of expressions.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"B.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Recipe Text Generation","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-65","text":"6 ","element":"a"},{"text":"shows the generated examples. Although the three models fail to cover all given ingredients, our model gives the most complete procedure for making a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"pumpkin pie ","element":"span"},{"text":"which includes five steps: ","element":"span"},{"style":{"fontStyle":"italic"},"text":"1.beat eggs ","element":"span"},{"style":{"height":10.4},"width":81.19,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/13-2.png","element":"img","alt":" − >","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"2.blend with some other ingredients ","element":"span"},{"style":{"height":10.4},"width":85.8,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/13-3.png","element":"img","alt":" − >","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"3.pour into pie shell ","element":"span"},{"style":{"height":13.2},"width":314.24,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/13-4.png","element":"img","alt":" − > 4.bake − >","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"5.cool","element":"span"},{"text":". Our model also gives the most specific and precise instructions for step 4 and step 5. By contrast, all baselines miss step 3 or step 5. Checklist produces the general phrase “combine all of the ingredients”. CVAE suffers from the redundancy and incoherence problems. Pointer-S2S mentions the most ingredients but misses the most important one “pumpkin”. Link-S2S misses “pumpkin” and generates incoherent expressions.","element":"span"}],[{"id":"id-64","style":{"width":"94%"},"width":1711,"height":2666,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/14-0.png","element":"img"}],[{"text":"Figure 5: Generated advertising texts from different models. Attribute values are colored in red. Repeated expressions are underlined.","element":"figcaption","subtype":"caption"}],[{"id":"id-65","style":{"width":"95%"},"width":1733,"height":1340,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1908.06605/images/15-0.png","element":"img"}],[{"text":"Figure 6: Generated recipes from different models.","element":"figcaption","subtype":"caption"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]