- Comparing XML Documents as Reference-aware Labeled Ordered Trees
- Mikhaiel, Rimon A. E.
Tree Edit Distance
XML Edit Distance
- Sep 30, 2011 6:11 PM
- Adobe PDF
- 6106683 bytes
- XML, the Extensible Markup Language, is the standard exchange format for modern Information Systems, Service Oriented Architecture (SOA) and the Semantic Web. Hence, comparing XML documents has become a necessary task for tracking and merging changes between versions of the same document, or for translating between documents referring to the same information but complying with different schemata or originating from different parties. In this scenario, given two documents, XML differencing is the process of finding an edit sequence, namely a sequence of exact and approximate matching, deletion, and insertion operations, which, if applied to the first document will result in the second. In practice, domain-specific differencing solutions are expensive to develop, and hard to reuse. Therefore, a generic differencing approach, able to serve various domains, would be both useful and cost-effective. This thesis presents VTracker, a generic XML differencing approach, which is capable of capturing domain knowledge and semantics through a configurable domainspecific cost function. VTracker views an XML document as an ordered labeled tree. Given two XML-document trees and a cost function VTracker calculates the tree-edit distance needed to transform one tree to the other. The first contribution of VTracker is an automatic method used to synthesize such a cost function based on the domain’s XML Schema Definition (XSD). Second, VTracker considers the XML reference structure in addition to the natural XML containment structure. Third, VTracker implements an affine-cost policy that prefers edit operations applied to neighbors over dispersed elements. Finally, VTracker uses a set of simplicity heuristics to nominate the best edit script in case of multiple ones found with the same minimum cost. VTracker was applied to a variety of domains, namely OWL/RDF, WSDL, BPEL, UML/XMI, XHTML, and RNA secondary structure, where it performed competitively with, or even better than, state-of-theart methods in each of these domains.
- Doctor of Philosophy
- Department of Computing Science
- Fall 2011
- Stroulia, Eleni (Computing Science)
Hoover, Jim (Computing Science)
Rafiei, Davood (Computing Science)
Kurgan, Lukasz (Electrical and Computer Engineering)
Deursen, Arie van (Delft University)
Theses and Dissertations Spring 2009 to present
Department of Computing Science
Delete your item from era
Do you really want to delete "Comparing XML Documents as Reference-aware Labeled Ordered Trees" ?
Resotre your item to era
Do you really want to restore "Comparing XML Documents as Reference-aware Labeled Ordered Trees" ?
Purge your item from era
Do you really want to permanently delete "Comparing XML Documents as Reference-aware Labeled Ordered Trees" ?
Remove your item from era
Do you really want to remove "Comparing XML Documents as Reference-aware Labeled Ordered Trees" ?