Usage
  • 106 views
  • 262 downloads

Harnessing Tweets to get the Pulse of a City

  • Author / Creator
    Esha,.
  • Twitter is one of the most popular social media applications and is used for a number of reasons. Every day, users share a vast amount of information through tweets that provide location-relevant updates, of events happening in real-time, and to inform other users of upcoming events in a given geographical location. The information in tweets can be used, not only to learn about what is happening in a city, but also to understand users’ emotions (e.g., love, fear) and sentiments (e.g., positive, negative) on topics and events as they unfold over time. Such information will be relevant and useful only when the right location is identified for a given set of tweets. Further, considering the volume of data generated on Twitter, both categorization of tweets and visualizations can help users in managing information overload. Categorization of tweets into topic labels can help in identifying broad level categories of topics discussed in a city and filtering unwanted tweets by allowing users to focus on accessing tweets from categories that are of interest to them. Visualization can play a critical role in presenting large and complex data into more easily discerning formats to facilitate comparison on different facets. This research focused on these multiple areas including identification of locations relevant to tweets, visualizations of location-related sentiments and emotions, and categorization of tweets into topic labels.
    The identification of tweet-relevant location is a challenging problem as location names are not always explicitly included in most of the tweets. However, location related information is implicitly included with the insertion of user-ids and hashtags in tweets. Thus, the research aim is to improve identification of tweet-relevant location by harnessing in-formation embedded in user-ids (e.g., @EPLdotCA is the userId of the public libraries in the city of Edmonton) and hashtags (e.g., #yeg is the hashtag for the city of Edmonton). This novel approach, termed DigiCities, focused on using this implicit information to identify tweet-relevant locations.
    DigiCities are digital equivalents of cities as represented in digital spaces; cities are primarily represented by People, Organizations and Places (POP) in the physical environment, which has digital presence on Twitter as well as through user-ids and hashtags. Digital profiles of cities are created using user-ids and hashtags of people, organizations and places associated with each city and are then used to identify and reinforce city names in tweets. The digital profiles of eight cities from the Province of Alberta in Canada were developed, and a number of classification experiments using different algorithms including k-Nearest Neighbour (kNN), Naïve Bayes (NB) and Sequential Minimal Optimization (SMO) were conducted to evaluate the effectiveness of the proposed approach. The classification accuracy score improved for each algorithm after the implementation of the city profile on Twitter data. Furthermore, tweets from these eight locations were further analyzed to identify users’ sentiments and emotions, and associated topics. Multiple visuals of results achieved were developed to compare and contrast sentiments and emotions during different temporal periods at city level.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-qv9w-g469
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.