Statistical learning theory studies the statistical aspects of machine learning and automated reasoning, through the use of (sampled) data. In particular, the focus is on characterizing the generalization ability of learning algorithms in terms of how well they perform on ``new'' data when trained on some given data set. The focus of the course is on: providing the the fundamental tools used in this analysis; understanding the performance of widely used learning algorithms (with a focus on regression and classification); understanding the ``art'' of designing good algorithms, both in terms of statistical and computational properties. Potential topics include: concentration of measure; empirical process theory; online learning; stochastic optimization; margin based algorithms; feature selection; regularization; PCA.

Sham Kakade | skakade at wharton.upenn.edu |

Time: | MW : 3 - 4:30 |

Location: | G90 JMHH |

- Lecture 0
- Risk vs. Risk: Some terminology differences between Stats and ML
- (ML people have not defined risk analogously, causing some confusion)
- lecture notes pdf

- Lecture 1: 1/12/11
- Introduction; Bias-Variance Tradeoff
- lecture notes pdf

- Lecture 2: 1/19/11
- Fixed Design Regression and Ridge Regression
- lecture notes pdf

- Lecture 3: 1/24/11
- Ridge Regression and PCA
- lecture notes pdf

- Lecture 4: 1/26/11
- The Central Limit Theorem; Large Deviations; and Rate Functions
- lecture notes pdf

- Lecture 5: 1/30/11
- The Moment Method; Convex Duality; and Large/Medium/Small Deviations
- lecture notes pdf

- Lecture 6: 2/2/11
- Hoeffding, Chernoff, Bennet, and Bernstein Bounds
- lecture notes pdf

- Lecture 7: 2/7/11
- Feature Selection, Empirical Risk Minimization, and The Orthogonal Case
- lecture notes pdf

- Lecture 8: 2/9/11
- Feature Selection and Chi^2 Tail bounds
- lecture notes pdf

- Lecture 9: 2/14/11
- Lecture 10: 2/16/11
- Bracketing Covering Numbers
- lecture 10 notes pdf

- Lecture 11: 2/21/11
- Symmetrization and Rademacher Averages
- lecture 11 notes pdf

- Lecture 12: 2/23/11
- Rademacher Composition and Linear Prediction
- lecture 12 notes pdf

- Lecture 13: 2/28/11
- Review: Norms and Dual Norms

- Lecture 14: 3/2/11
- Bounded Differences, Rademacher Averages, and L1 Regularization
- lecture 14 notes pdf

- Lecture 15: 3/14/11
- Lecture 16: 3/16/11
- Uniform and Empirical Covering Numbers
- lecture 16 notes pdf

- Lecture 17: 3/21/11
- Dudley's Theorem and Packing Numbers
- lecture 17 notes pdf

- Lecture 18: 3/28/11
- Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
- lecture 18 notes pdf

- Lecture 19: 3/30/11
- Perceptron Lower Bound & The Winnow Algorithm
- lecture 19 notes pdf

- Lecture 20: 4/4/11
- The Perceptron for Generalized Linear Models and Single Index Models
- lecture 20 notes pdf

- Lecture 21: 4/6/11
- Online Convex Programming and Gradient Descent
- lecture 21 notes pdf

- Lecture 22: 4/11/11
- Exponentiated Gradient Descent
- lecture 22 notes pdf

- Lecture 23: 4/13/11
- Online to Batch Conversions
- lecture 23 notes pdf

- Lecture 24: 4/18/11
- Growth Functions and the VC dimension
- lecture 24 notes pdf

- Lecture 25: 4/20/11
- Boosting
- lecture 25 notes pdf