Usage
  • 83 views
  • 133 downloads

Unsupervised Syntax-based Probabilistic Sentence Generation

  • Author / Creator
    Sayehban, Shahrzad
  • Sentence reconstruction and generation are essential applications in Natural Language Processing (NLP). Early studies were based on classic methods such as production rules and statistical models. Recently, the prevailing models typically use deep neural networks. In this study, we utilize deep neural networks to develop a model capable of generating new and unseen sentences or reconstructing the given input with minor changes. To achieve this goal, we develop an unsupervised tree-based model based on the Variational Autoencoder (VAE) framework.

    Our approach utilizes the grammar rules of natural language and generates sentences based on phrases. This approach helps the generated sentence to be semantically and syntactically correct. Previous models typically considered the tokens sequentially, and the syntax was only learned implicitly. By contrast, our model learns both the sequence of the tokens and the syntax of the sentences explicitly in order to generate better samples. The variational modelling enables us to sample from the continuous latent space to generate new sentences or reconstruct the input.

    We demonstrate the effectiveness of this model through experiments. The tree-based VAE model is trained on the Stanford Natural Language Inference (SNLI) dataset. First, we compute the BLEU score for the given input to evaluate the reconstruction capability of the model and how the model can preserve the information from the input. This score shows that our proposed model reconstructs the input sentence better than the baseline. Second, random sampling from the latent space is used to evaluate how fluent the generation is. We observe the perplexity, UniKL, and entropy to evaluate the quality of the generated sentences. The results show that the generated sentences are less semantically meaningful. However, the sentences are correct in terms of the syntax and the order of phrases. The reason is that the rules are applied in a way that correct parse trees are generated.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-9xqd-4q59
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.