Staged Grid NewSQL Database System for OLTP and Big Data Applications

  • Author / Creator
    Wu, Lengdong
  • Big data applications demand and consequently lead to developments of diverse scalable data management systems, ranging from NoSQL systems to the emerging NewSQL systems. In order to serve thousands of applications and their huge amounts of data, data management systems must be capable of scale-out to clusters of commodity servers. The overarching goal of this dissertation is to propose principles, paradigms and protocols to architect efficient, scalable and practical NewSQL database systems that address the unique set of challenges posed by the big data trend. This dissertation shows that with careful choice of design and features, it is possible to implement scalable NewSQL database systems that efficiently support transactional semantics to ease application design. In this dissertation, we first investigate, analyze and characterize current scalable data management systems in depth and develop comprehensive taxonomies for various critical aspects covering the data model, the system architecture and the consistency model. On the basis of analyzing the scalability limitations of current systems, we then highlight the key principles for designing and implementing scalable NewSQL database systems. This dissertation advances the state-of-the-art by improving and providing satisfactory solutions to critical facets of NewSQL database systems. In particular, first we specify a staged grid architecture to support scalable and efficient transaction processing using clusters of commodity servers. The key insight is to disintegrate and reassemble system components into encapsulated staged modules. Effective behavior rules for communication are then defined to orchestrate independent staged modules deployed on networked computing nodes into one integrated system. Second, we propose a new formula-based protocol for distributed concurrency control to support thousands of concurrent users accessing data distributed over commodity servers. The formula protocol for concurrency is a variation of the multi-version time-stamp concurrency control protocol, which guarantees serializability. We reduce the overhead of conventional implementation by technologies including logical formula caching and dynamic timestamp ordering. Third, we identify a new consistency model-BASIC (Basic Availability, Scalability, Instant Consistency) that matches the requirements where extra efforts are not needed to manipulate inconsistent soft states of weak consistency models. BASIC extends the current understanding of CAP theorem by characterizing precisely different degree of dimensions that can be achieved rather than simply what cannot be done. We introduce all these novel ideas and features based on the implementation of Rubato DB, a highly scalable NewSQL database system. We have conducted extensive experiments that clearly show that Rubato DB is highly scalable with efficient performance under both TPC-C and YCSB benchmarks. These results verify that the staged grid architecture and the formula protocol provide a satisfactory solution to one of the important challenges in the NewSQL database systems: to develop a highly scalable database management system that supports various consistency levels from ACID to BASE.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.