Similarity Assessment of Data in Semantic Web

  • Author / Creator
    Dehleh Hossein Zadeh, Parisa
  • The web is a constantly growing repository of information. Enormous amount of available information on the web creates a demand for automatic ways of processing and analyzing data. One of the most common activities performed by these processes is comparison of data – it is done to find something new or confirm things we already know. In each case there is a need for determining similarity between different objects and pieces of information. The process of determining similarity seems to be relatively easy when it is done for a numerical data, but it is not so in the case of a symbolic data. In order to make the data stored on the Internet more accessible, a new model of data representation has been introduced – Resource Description Framework. Linked data provides an open platform for representing and storing structured data as well as ontology. This aspect of data representation has been fully utilized for providing fundamentals for the new forms of Internet, Linked Data and Semantic Web. In this thesis, we investigate the problem of determining semantic similarity between entities in which not just lexical and syntactical information of entities are used, but the whole existing knowledge structure including the instantiated ontology is exploited. The idea is based on the fact that entities are interconnected and their semantics is defined via their connections to other entities as well as the metadata expressed as ontology. We propose feature-based methods for similarity assessment of concepts represented in ontology as well as in a less constrained Resource Description Framework. Membership functions are used to capture the importance of connections between entities at different hierarchy levels in ontology. We leverage importance weighted quantifier guided operator to aggregate the similarity values related to different groups of properties. In another proposed approach, we use concepts of possibility theory to determine lower and upper bounds of similarity intervals. In addition, we address contextual similarity assessment when only specific context is taken into consideration. The idea of ranking entities’ features according to their importance in describing an entity is introduced. We propose an approach that calculates similarly measures for these categories of features and then aggregates them using fuzzy-expressed weights that represents rankings of these categories. The promising results of our developed similarity method have encouraged us to extend it to a more comprehensive approach. As a result, we propose a technique for automatic identification of the importance of features and ranking them accordingly. Finally, we tackle the problem of application of heterogeneous feature types for defining entities. A method is described utilizing fuzzy set theory and linguistic aggregation to compare features of different types. We deploy this technique in a practical pharmaceutical application, where the proposed similarity assessment is shown to be capable of finding relevant entities – drugs in this case, in spite of heterogeneous features used to define them.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Electrical and Computer Engineering
  • Specialization
    • Software Engineering and Intelligent Systems
  • Supervisor / co-supervisor and their department(s)
    • Marek Z. Reformat (Electrical and Computer Engineering)
  • Examining committee members and their departments
    • Chang-Shing Lee (National University of Tainan, Taiwan)
    • Ken Wong (Computing Science)
    • Petr Musilek (Electrical and Computer Engineering)
    • Marek Z. Reformat (Electrical and Computer Engineering)
    • Witold Pedrycz (Electrical and Computer Engineering)
    • Di Niu (Electrical and Computer Engineering)