Search
Skip to Search Results-
Advantage of Integration in Big Data: Feature Generation in Multi-Relational Databases for Imbalanced Learning
DownloadFall 2016
Most data mining and machine learning techniques rely on a single flat table and assume balanced training data. However, most real-world applications comprise databases having multiple tables and imbalanced data. It becomes further complicated in the realm of Big Data where related information is...
-
Big Data in the Global Realm: An Assessment of International Relations' Ability to Study 21st Century Developments
DownloadFall 2022
This thesis examines Big Data as the latest and perhaps most potent iteration of a number of transformative technologies that have had and continue to have an impact of global politics and international power hierarchies. The thesis seeks to examine if the discipline of IR, with its current...
-
Spring 2014
Metabolomics involves the high throughput characterization of small molecules or metabolites in cells, tissues and organisms. To interpret, store and exchange metabolomic data it is necessary to have comprehensive, electronically accessible databases that can be used to handle both the...
-
Developing and Evaluating Algorithms for Fixing Omission and Commission Errors in Structured Data
DownloadFall 2020
The use of machine learning is rapidly rising to deliver a variety of benefits in various domains. However, developing predictive systems often faces many challenges that can drastically delay model deployment. For instance, obtaining labeled training data is one of the most expensive bottlenecks...
-
Fall 2011
Metabolomics aims to study all small-molecule compounds (i.e. metabolites) in cells, tissues, or biofluids. These compounds provide a functional readout of the physiological, developmental, and pathological state of a biological system. The field of metabolomics has expanded rapidly over the last...
-
eHealth and mHealth Pipelines for Clinical Decision Support to Improve Medication Selection and Safety
DownloadFall 2015
Although much work has been done over the past decade on developing personalized and evidence-based medicine, such as diagnostic tests based on genetics to better predict patients' responses to therapy, stumbling blocks remain that have prevented knowledge, tests, and other pertinent...
-
Estimating the Overlap of Top Instances in Lists Ranked by Correlation to Label
Spring 2012
Recent advances in high-throughput technologies, such as genome-wide SNP analysis and microar- ray gene expression profiling, have led to a multitude of ranked lists, where the features (SNPs, genes) are sorted based on their individual correlation with a phenotype. Multiple reviews have shown...
-
Fall 2017
This thesis examines the predictability of Canadian recessions with special emphasis on variable selection in a big data environment. The first paper in this thesis addresses the problem of variable selection from a traditional point of view by employing a prescreened set of selected individual...
-
Inference of epigenetic subnetworks and expression-based analysis using a breast cancer dataset
DownloadFall 2018
Changes in gene expression have been thought to play a crucial role in various types of cancer. With the advance of high-throughput experimental techniques, many genome-wide studies are underway to analyze underlying mechanisms that may drive the changes in gene expression. It has been observed...