Relational Databases for Querying Natural Language Text

  • Author(s) / Creator(s)
  • Technical report TR07-08. With the vast amount of information stored in natural language text, sophisticated query engines are needed to pull data and effectively relate the pieces. While there has been a great deal of activity around semistructured data and in particular XML, there has not been much work on querying natural language text, despite the regularities that exist in natural language text. This paper explores a more conservative approach where natural language text is stored in a relational database. We present a framework for querying and integrating natural language text with relational data and investigate different strategies for optimizing queries. Our results show that the size of the plan space depends on the number of query terms and the overlap between query rewritings. Moreover, we show that the complexity of finding an optimal plan in the presence of rewritings is NP-hard. We develop a cost model and pruning techniques to reduce the size of the search space, and a polynomial-time greedy algorithm that finds a sub-optimal plan over a set of rewritings. Our experimental results indicate great savings in the evaluation costs of the optimized queries and that our greedy algorithm finds either an optimal plan or a plan that is very close to optimal in terms of cost. | TRID-ID TR07-08

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International