ERA Banner
Download Add to Cart Share
More Like This
  • Comparing XML Documents as Reference-aware Labeled Ordered Trees
  • Mikhaiel, Rimon A. E.
  • en_US
  • XML Differencing
    Tree Edit Distance
    XML Edit Distance
    Ontology Matching
  • Sep 30, 2011 6:11 PM
  • Thesis
  • en_US
  • Adobe PDF
  • 6106683 bytes
  • XML, the Extensible Markup Language, is the standard exchange format for modern Information Systems, Service Oriented Architecture (SOA) and the Semantic Web. Hence, comparing XML documents has become a necessary task for tracking and merging changes between versions of the same document, or for translating between documents referring to the same information but complying with different schemata or originating from different parties. In this scenario, given two documents, XML differencing is the process of finding an edit sequence, namely a sequence of exact and approximate matching, deletion, and insertion operations, which, if applied to the first document will result in the second. In practice, domain-specific differencing solutions are expensive to develop, and hard to reuse. Therefore, a generic differencing approach, able to serve various domains, would be both useful and cost-effective. This thesis presents VTracker, a generic XML differencing approach, which is capable of capturing domain knowledge and semantics through a configurable domainspecific cost function. VTracker views an XML document as an ordered labeled tree. Given two XML-document trees and a cost function VTracker calculates the tree-edit distance needed to transform one tree to the other. The first contribution of VTracker is an automatic method used to synthesize such a cost function based on the domain’s XML Schema Definition (XSD). Second, VTracker considers the XML reference structure in addition to the natural XML containment structure. Third, VTracker implements an affine-cost policy that prefers edit operations applied to neighbors over dispersed elements. Finally, VTracker uses a set of simplicity heuristics to nominate the best edit script in case of multiple ones found with the same minimum cost. VTracker was applied to a variety of domains, namely OWL/RDF, WSDL, BPEL, UML/XMI, XHTML, and RNA secondary structure, where it performed competitively with, or even better than, state-of-theart methods in each of these domains.
  • Doctoral
  • Doctor of Philosophy
  • Department of Computing Science
  • Fall 2011
  • Stroulia, Eleni (Computing Science)
  • Hoover, Jim (Computing Science)
    Rafiei, Davood (Computing Science)
    Kurgan, Lukasz (Electrical and Computer Engineering)
    Deursen, Arie van (Delft University)


Download license