580.691/491 Learning Theory
Course Instructor: Reza Shadmehr
Spring semester 2017
Overview: This course introduces the probabilistic foundations of learning theory. We will discuss topics in regression, estimation, optimal control, system identification, Bayesian learning, and classification. Our aim is to first derive some of the important mathematical results in these topics, and then apply the framework to problems in biology, particularly animal learning and control of action. The lectures are recorded and are available freely with the links below.
Lecture times: Mondays and Wednesdays, 3:00 – 4:15 PM, Shaffer 303
Teaching Assistant: Tehrim Yoon
Exams: Midterm on March 15. Final project due 5PM, May 13.
Textbook: Biological Learning and Control, MIT Press, 2012
Lecture 2: (Review of probability theory,
rule, expected value and variance of random variables and sum of random
variables, expected value of random variables raised
to a power, Binomial distribution, Poisson distribution, Normal distribution.
Homework: probability theory
Regression, generalization, and maximum likelihood
Lecture 3: (LMS_1.ppt, Lecture) Loss function
as mean squared error; batch learning and the normal equation; Cross
validation, batch vs. online learning, steepest descent algorithm, LMS,
convergence of LMS.
Homework: (simulation) classify using regression. Data set.
Lecture 4: (LMS_2.ppt, Lecture) Newton-Raphson, LMS and steepest descent with Newton-Raphson, weighted least-squares, regression with basis functions, estimating the loss function for learning
Homework: moving centers of Gaussian bases.
Lecture 5: (generalization_2.ppt, Lecture) sensitivity to
error, modulation of error sensitivity, generalization function, estimation of
generalization function from sequence of errors
Herzfeld et al. (2014) A memory of errors in sensorimotor learning. Science 345:1349-1353.
Homework: (simulation) estimate generalization function from record of errors. Data set.
Lecture 6: (ML_1.ppt, Lecture) Maximum
likelihood estimation; likelihood of data given a distribution; ML estimate of
model weights and model noise, integration of multiple sensory data.
Reading: chapters 4.1-4.5.
Homework: derive online estimates of model weights and model noise.
State estimation and the Kalman filter
Lecture 7: (state_estimation1.ppt,
Lecture) Optimal parameter
estimation, parameter uncertainty, state noise and measurement noise, adjusting
learning rates to minimize model uncertainty. Derivation of the Kalman
Reading: chapters 4.6 and 4.7 of Shadmehr and Mussa-Ivaldi.
Homework: Convergence of the Kalman gain and uncertainty.
Lecture 9: (Bayes_2.ppt, Lecture) Kalman filter
and Bayesian estimation; factorization of joint distribution of Gaussian
Reading: chapter 5.1.
Homework: posterior distribution with two observed data points; maximizing the posterior directly.
Lecture 10: (Lecture) Causal
inference and the problem of deciding between two generative models; the
influence of priors in how we make movements and perceive motion; the influence
of priors in cognitive decision making.
Reading: chapters 5.2-5.3.
Lecture 11: (Lecture) Use of the Kalman
gain to account for learning in animals, classical conditioning, Kamin blocking, and backward blocking, with examples of
adaptation in people.
Reading: chapters 5.5, 6.1-6.4.
Sensorimotor adaptation and state-space models
Lecture 12: (Lecture) A generative model of sensorimotor adaptation experiments;
accounting for sensory illusions during adaptation; effect of statistics of
prior actions on patterns of learning.
Reading: chapters 6.5-6.7.
Lecture 13: (Lecture) Modulating
sensitivity to error through manipulation of state and measurement noises;
modulating forgetting rates. Modulating sensitivity to error through memory of errors.
Reading: chapter 7.
Reading: Herzfeld et al. (2014) A memory of errors in sensorimotor learning. Science 345:1349-1353.
Homework: adaptive error-sensitivity (pdf) Data set
Expectation Maximization and system identification
Lecture 16: (Lecture) Identification
of the learner, Expectation maximization as an algorithm for system
Reading: chapters 9.8-9.9
Lecture 17: Generalized Expectation Maximization.
Optimal control and the Bellman equation
Lecture 18: (Lecture) Motor costs and rewards.
Movement vigor and encoding of reward. Muscle tuning
functions as a signature of motor costs. Minimizing costs while
meeting a constraint (Lagrange multipliers).
Reading: Chapter 10.
Lecture 19: Lecture notes. Open loop optimal
control with signal-dependent noise to minimize endpoint variance with the
constraint that the endpoint state should be at the goal location.
Reading: Harris and Wolpert (1998) Signal-dependent noise determines motor planning. Nature 394:780-784.
Lecture 20: (Lecture) Open loop optimal control with
cost of time. Temporal discounting of reward. Optimizing movement
duration with motor and accuracy costs. Control of saccades as an example
of a movement in which cost of time appears to be hyperbolic.
Reading: Chapter 11.
Lecture 23: (Lecture) Optimal
feedback control with signal dependent noise.
Reading: Chapter 12.4-12.6.
Classification via Bayesian estimation
Lecture 24: (Lecture) Introduction to classification;
Fisher linear discriminant, classification using posterior probabilities with
explicit models of densities, confidence and error bounds of the Bayes
classifier, Chernoff error bounds.
Homework: Bayesian classification of a binary decision
Lecture 25: Linear and quadratic decision
boundaries. Equal-variance Gaussian densities (linear discriminant
analysis), unequal-variance Gaussian densities (quadratic discriminant
analysis), Kernel estimates of density.
Homework: Classification using assumptions of equal and unequal Gaussian distributions; classification using kernel density estimates.
Lecture 26: Logistic regression as a method
to model posterior probability of class membership as a function of state
variables; batch algorithm: Iterative Re-weighted Least Squares; on-line
Homework: logistic regression with multiple classes of unequal variance.
(SubSpace.ppt, Lecture) Introduction to subspace analysis; projection of row vectors of
matrices, singular value decomposition, system identification of deterministic
systems using subspace methods.
Homework: system identification of a deterministic system
Reading: chapters 9.1-9.6
Overschee and De Moor (1996) Subspace identification for linear systems: theory, implementation, applications. Kluwer Academic, The Netherlands