Usage
  • 454 views
  • 352 downloads

Comparing XML Documents as Reference-aware Labeled Ordered Trees

  • Author / Creator
    Mikhaiel, Rimon A. E.
  • XML, the Extensible Markup Language, is the standard exchange format for
    modern Information Systems, Service Oriented Architecture (SOA) and the
    Semantic Web. Hence, comparing XML documents has become a necessary task
    for tracking and merging changes between versions of the same document, or for
    translating between documents referring to the same information but complying
    with different schemata or originating from different parties. In this scenario,
    given two documents, XML differencing is the process of finding an edit
    sequence, namely a sequence of exact and approximate matching, deletion, and
    insertion operations, which, if applied to the first document will result in the
    second. In practice, domain-specific differencing solutions are expensive to
    develop, and hard to reuse. Therefore, a generic differencing approach, able to
    serve various domains, would be both useful and cost-effective. This thesis
    presents VTracker, a generic XML differencing approach, which is capable of
    capturing domain knowledge and semantics through a configurable domainspecific
    cost function. VTracker views an XML document as an ordered labeled
    tree. Given two XML-document trees and a cost function VTracker calculates the
    tree-edit distance needed to transform one tree to the other. The first contribution
    of VTracker is an automatic method used to synthesize such a cost function based
    on the domain’s XML Schema Definition (XSD). Second, VTracker considers the
    XML reference structure in addition to the natural XML containment structure.
    Third, VTracker implements an affine-cost policy that prefers edit operations
    applied to neighbors over dispersed elements. Finally, VTracker uses a set of
    simplicity heuristics to nominate the best edit script in case of multiple ones found
    with the same minimum cost. VTracker was applied to a variety of domains,
    namely OWL/RDF, WSDL, BPEL, UML/XMI, XHTML, and RNA secondary
    structure, where it performed competitively with, or even better than, state-of-theart
    methods in each of these domains.

  • Subjects / Keywords
  • Graduation date
    Fall 2011
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/R3SM4Q
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
    English
  • Institution
    University of Alberta
  • Degree level
    Doctoral
  • Department
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Deursen, Arie van (Delft University)
    • Rafiei, Davood (Computing Science)
    • Kurgan, Lukasz (Electrical and Computer Engineering)
    • Hoover, Jim (Computing Science)