Search

Filter

Subject / Keyword

Show 4 more ...

Departments

5Department of Computing Science

Languages

5English

Supervisors

Author / Creator / Contributor

Year

Collections

Item type

5Thesis

Directly Learning Predictors on Missing Data with Neural Networks
Download

Fall 2023

Awwal, Alvina

The problem of missing data is omnipresent in a wide range of real-world datasets. When learning and predicting on this data with neural networks, the typical strategy is to fill-in or complete these missing values in the dataset, called impute-then-regress. Much less common is to attempt to...
Distributional Losses for Regression
Download

Spring 2019

Imani, Ehsan

In this thesis we introduce a new loss for regression, the Histogram Loss. There is some evidence that, in the problem of sequential decision making, estimating the full distribution of return offers a considerable gain in performance, even though only the mean of that distribution is used in...
Greedy Pruning for Continually Adapting Networks
Download

Spring 2023

Shah, Haseeb

Gradient Descent algorithms suffer many problems when learning representations using fixed neural network architectures, such as reduced plasticity on non-stationary continual tasks and difficulty training sparse architectures from scratch. A common workaround is continuously adapting the neural...
Strange springs in many dimensions: how parametric resonance can explain divergence under covariate shift.
Download

Fall 2021

Banman, Kirby

Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on independently and identically ditributed (iid) data sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated inputs such as continual learning and reinforcement learning....
Vector Step-size Adaptation for Continual, Online Prediction
Download

Fall 2019

Jacobsen, Andrew

In this thesis, we investigate different vector step-size adaptation approaches for continual, online prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad,...