Inferring Semantic Information from Websites: A View into Contextual Advertising and User Behavior Profiling

  • Author / Creator
    Panwar, Abhimanyu
  • The World Wide Web has become an important platform for the execution of diverse types of human endeavor. There are billions of webpages covering different subjects and users with varied backgrounds on the web. Every day, colossal volumes of data are collected about the usage of websites. Moreover the web is ever changing. Such situations present unique opportunities and problems for commercial organizations and researchers alike. In this thesis, we explore two prominent research problems concerning the web. The first problem is “delivering relevant ads to webpages based upon their content”. This practice is known as contextual advertising. Worldwide online advertisement revenues have reached US$117 billion. Contextual advertisement contributes to these revenues. The second problem is on “deducing user behavior patterns of a website”. Understanding user behavior on a website offers several advantages to web service providers, business managers and security experts. In this work, we present a novel two stage architecture for the ad-network to implement contextual advertising. An Ad-network has to deliver relevant ads to the requesting webpage in real time. It classifies the webpage based on its content into one of the nodes of the taxonomy and selects matching ads from the ad-repository. We present novel schemes for representing webpages by exploiting the semi-structured-ness of a webpage and its neighboring pages in the web graph, for the purpose of subject based classification. Initial experiments established the importance of a well-built taxonomy for this purpose. We construct a taxonomy, suitable for subject based webpage classification, from the Open Directory Project. Subsequently, we conducted comparative experiments on the Contextual Advertising systems implemented using the approaches described. We address the problem of mining user behavior patterns of a website. A user behavior profile (UBP) represents a sequence of webpages requested by the user to fulfill a purpose while browsing the website. To perform user behavior profiling of a website, we present an automated methodology to mine UBPs from the server log files of a website. We introduce an alphabet of 35 labels to represent functionality features implemented by sets of webpages. We also introduce 9 most common UBPs. We present an approach to prepare user traces, in the alphabet of labels, iii from the log files. We model a user trace as a Hidden Markov Model. Experiments reveal that the proposed technique performs better than other alternative algorithms. We present an industrial case study to prove the efficacy of the approach.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Electrical and Computer Engineering
  • Specialization
    • Software Engineering and Intelligent Systems
  • Supervisor / co-supervisor and their department(s)
    • Miller, James (Electrical and Computer Engineering)
  • Examining committee members and their departments
    • Joseph, Dileepan (Electrical and Computer Engineering)
    • Stroulia, Eleni(Computing Science)
    • Miller, James (Electrical and Computer Engineering)