580.691/491 Learning
Theory
Course
Instructor: Reza Shadmehr
Overview: This course introduces the
probabilistic foundations of learning theory. We will discuss topics in
regression, estimation, optimal control, system identification, Bayesian
learning, and classification. Our aim is to first derive some of the important
mathematical results in these topics, and then apply the framework to problems
in biology, particularly animal learning and control of action.
Lecture
times:
Spring semester, 2008. Mondays and Wednesdays, 4:30 –
5:45 PM,
Teaching
Assistant:
Exams: Midterm on March
12, Final on April 30.
Useful mathematical identities
Course
Outline:
•
Introduction
•
Lecture 1: (intro.ppt) Introduction: adaptation
vs. learning; linear classifiers; types of adaptation: supervised,
unsupervised, reinforcement.
Homework: digit classification and
cross validation
•
Lecture 2: Review of probability theory. Bayes rule,
expected value and variance of random variables and sum of random variables,
expected value of random variables raised to a power,
Binomial distribution, Poisson distribution, Normal distribution.
Homework: probability theory
•
Regression,
generalization, and maximum likelihood
•
Lecture 3: (LMS_1.ppt) Loss function as mean
squared error; batch learning and the normal equation; Cross validation, batch
vs. online learning, steepest descent algorithm, LMS, convergence of LMS.
Homework: (simulation) classify
using regression. Data set.
•
Lecture 4: (LMS_2.ppt) Newton-Raphson,
LMS and steepest descent with Newton-Raphson,
weighted least-squares, regression with basis functions.
Homework: moving centers of
Gaussian bases.
•
Lecture 5: (generalization.ppt)
generalization function, examples from psychophysics, estimation of
generalization function from sequence of errors (linear technique)
Paper to discuss: Poggio T, Fahle
M, Edelman S. (1992) Fast perceptual learning in visual hyperacuity. Science
1992 May 15, 256:1018-21.
Homework: (simulation) estimate generalization function from record of
errors. Data set.
•
Lecture 6: (ML_1.ppt) Maximum likelihood
estimation; likelihood of data given a distribution; ML estimate of model
weights and model noise.
Homework: derive online estimates of
model weights and model noise.
•
Lecture 7: (ML_2.ppt) Distribution of the ML
estimate of model weights and model noise; multi-variate
normal distribution; variance of model weights as a measure of model
uncertainty.
Homework: ML estimate of coin-toss probability,
bias of the ML estimate of an exponential distribution.
•
State estimation of linear
stochastic systems
•
Lecture 8: (Kalmanfilter.ppt) Optimal
parameter estimation, parameter uncertainty, state noise and measurement noise,
adjusting learning rates to minimize model uncertainty. Derivation of the Kalman filter
algorithm.
Homework: Convergence of
the Kalman gain and uncertainty.
•
Lectures 9 and 10: (same file as for lecture 9) Application
of the optimal estimation algorithm to biological data: classical conditioning
in animals, data fusion and combining data from multiple sensors, fast and slow
memory systems, massed vs. spaced learning. Forward models and integration of predicted
sensory outcomes with measured sensory outcomes.
Homework: Simulation of a
learner with fast, medium, and slow learning systems.
Homework: Simulation of a
forward model with sensory feedback.
•
Optimal feedback control
of linear stochastic system
•
Lecture 11: (OptimalControl.ppt)
Introduction to optimal control; method of Lagrange multipliers, open-loop optimal
control, relating continuous and discrete linear systems.
Homework: non-uniform
weighting of cost of motor commands and simulation of a saccade.
•
Lecture 12: (same file as for lecture 11) Optimal feedback control, optimal stochastic feedback
control with Gaussian noise.
Homework: reaching to
targets that might move.
•
Lecture 13: (same file as for lecture 12)
duality of optimal feedback control and the Kalman filter, signal dependent
noise, optimal feedback control with signal dependent noise.
•
Identification of linear
stochastic systems
•
Lecture 14: (SubSpace.ppt) Introduction to subspace
analysis; projection of row vectors of matrices, singular value decomposition, system identification of deterministic systems using
subspace methods.
Homework: system identification
of a deterministic system
Data set
Overschee
and De Moor (1996) Subspace identification for linear systems: theory,
implementation, applications. Kluwer Academic, The
•
Bayesian integration
•
Lecture 15: (Bayes_1.ppt) “Single
stage” maximum a posteriori (MAP) estimators with examples from selected distributions:
coin toss with Beta distributed prior, classification with Gaussian distributed
prior.
Homework: naïve Bayes classifier.
•
Lecture 16: (Bayes_2.ppt) Gaussian distribution
and linear regression. Matrix inversion lemma; Bayesian integration of jointly
distributed random variables; linear regression with a prior.
Homework: Posterior distribution with
general variance covariance matrices; posterior distribution with two observed
data points; maximizing the posterior directly.
•
Lecture 17: (Bayes_3.ppt) Bayesian learning in
the central nervous system.
Derivation of LMS in the Bayesian case.
Papers to discuss: KP Koerding, DM Wolpert (2004) Bayesian integration in sensorimotor
learning. Nature 427:244-247. JM Hillis, MO Ernst, MS Banks, MS Landy
(2002) Combining sensory information: mandatory fusion within, but not between,
senses. Science 298:1627-1630.
Homework: simulation of bimodal
priors; optimal learning rates.
•
Classification via
Bayesian estimation
•
Lecture 18: Introduction to classification;
Fisher linear discriminant, classification using
posterior probabilities with explicit models of densities, confidence and error
bounds of the Bayes classifier, Chernoff
error bounds.
Homework: Bayesian
classification of a binary decision
•
Lecture 19: Linear and quadratic decision
boundaries. Equal-variance
Gaussian densities (linear discriminant analysis),
unequal-variance Gaussian densities (quadratic discriminant
analysis), Kernel estimates of density.
Homework: Classification
using assumptions of equal and unequal Gaussian distributions; classification
using kernel density estimates.
•
Lecture 20: Logistic regression as a
method to model posterior probability of class membership as a function of
state variables; batch algorithm: Iterative Re-weighted Least Squares; on-line
algorithm.
Homework: logistic
regression with multiple classes of unequal variance.
•
Lecture 21: Neural mechanisms of
classification learning; generalization in classification learning. Basal ganglia damage disrupts
classification learning but not cerebellar or medial-temporal lobe damage. Cerebellar damage disrupts a form of
regression learning but not basal ganglia damage.
Papers to discuss: Knowlton et al. (1996) A neo-striatal habit learning system
in humans. Science 273:1399. Poldrack et
al. (2001) Interactive memory systems in the human brain. Nature 414:546.
•
Expectation Maximization
•
Lecture 22: Unsupervised
classification. Mixture models,
K-means algorithm, and Expectation-Maximization (EM).
Homework: image segmentation. Imagedata
•
Lecture 23: EM
and conditional mixtures. EM as
maximizing the expected complete log-likelihood; method of Lagrange
multipliers; selecting number of mixture components; mixture of experts.
•
Reinforcement learning
•
Lecture 24: Introduction
to reinforcement learning; value functions and Bellman equations;
generalized policy iteration
Homework: rat maze problem. Mazedata
.
•
Lecture 25: Temporal
difference learning; policy improvement theorem; addiction and
reinforcement learning.
Homework. Randomwalkdata Schultzpaper
•
Lecture 26: TD-lambda
and eligibility trace.