Search
Skip to Search Results-
Fall 2023
The problem of missing data is omnipresent in a wide range of real-world datasets. When learning and predicting on this data with neural networks, the typical strategy is to fill-in or complete these missing values in the dataset, called impute-then-regress. Much less common is to attempt to...
-
Spring 2019
In this thesis we introduce a new loss for regression, the Histogram Loss. There is some evidence that, in the problem of sequential decision making, estimating the full distribution of return offers a considerable gain in performance, even though only the mean of that distribution is used in...
-
Spring 2023
Gradient Descent algorithms suffer many problems when learning representations using fixed neural network architectures, such as reduced plasticity on non-stationary continual tasks and difficulty training sparse architectures from scratch. A common workaround is continuously adapting the neural...
-
Strange springs in many dimensions: how parametric resonance can explain divergence under covariate shift.
DownloadFall 2021
Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on independently and identically ditributed (iid) data sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated inputs such as continual learning and reinforcement learning....
-
Fall 2019
In this thesis, we investigate different vector step-size adaptation approaches for continual, online prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad,...