Modern statistical approaches on large datasets must directly analyze and manipulate data in either matrix or vector formats. This course will focus on the statistical theory and practice of manipulating such data. The topics covered will be: multivariate analysis, dimensionality reduction, convexity issues of working with matrices, and spectral methods.

With regards to dimensionality reduction, we will cover PCA, CCA, and random projections (e.g. Johnson-Lindenstrauss) and examine potential applications. With regards to convexity issues, the course will examine the rudimentary question of how accurate is an SVD of a random matrix (we will examine a generalization of the Chernoff method to matrices). Other potential topics may include matrix completion (filling in the entries of a matrix with missing entries), subspace identification (e.g. learning time series models like Kalman filters based on a multivariate, covariance analysis), locality sensitive hashing (randomly projecting data for efficient storage and recall), matrix based regularization methods (and related convexity issues), and kernel methods/Gaussian process regression.

The major topics discussed in the course will include the following:

- Dimensionality reduction, including SVD and random projections.
- Accuracy of SVD: How accurate is the subspace found? We will cover a recent concentration result for random matrices.
- Implications for learning: We will examine projecting the data to low dimensional spaces and then learning on these lower dim spaces. We will cover both regression and clustering here.
- CCA, Subspace ID and Time Series: how to (probably learn) state space models, including Kalman filters and HMMs.
- Matrix Completion: how to fill in missing entries of a matrix?

Sham Kakade | skakade at wharton.upenn.edu |

Time: | MW : 1:30 - 3:00 |

Location: | JMHH F88 |

- Weds 1/13/10
- Weds 1/20/10
- Applications of SVD
- lecture notes pdf
- further reading:

Latent Semantic Analysis Wikipedia page

- Mon 1/25/10
- Weds & Mon, 1/27/10 & 2/1/10
- Recall and Locality Sensitive Hashing
- See the survey paper here.

- Weds 2/3/10
- Ridge Regression and Dimensionality Reduction
- lecture notes pdf

- Mon 2/8/10
- Mon 2/15/10
- Learning and Random Features
- lecture notes pdf
- further reading:

Random Features Page

- Weds 2/17/10
- Clustering in High Dimensions
- lecture notes pdf
- further reading:

Single Linkage Clustering. Wikipedia page

- Mon 2/22/10
- Clustering and PCA
- further reading:

Spectral Algorithms, Ch 2. pdf

- Weds 2/24/10
- Clustering and Non-Isotropic Mixtures
- further reading:

D. Achlioptas and F. McSherry On Spectral Learning of Mixtures of Distributions. pdf

- Mon 3/1/10
- Clustering, Random Projections, and Non-Isotropic PCA
- further reading:

S. Dasgupta. Learning mixtures of Gaussians. pdf

N. Srebro, G. Shakhnarovich, S. Roweis. An Investigation of Computational and Informational Limits in Gaussian Mixture Clustering. pdf

S. Brubaker and S. Vempala. N. Srebro. Isotropic PCA and Affine-Invariant Clustering. pdf

- Weds 3/3/10
- Isotropic PCA,Probabilistic Latent Semantic Indexing, Non-negative Matrix Factorization
- further reading:

probabilistic LSI Wikipedia page

T Hofmann. Probabilistic Latent Semantic Indexing. pdf

Non-negative Matrix Factorization Wikipedia page

D. Donoho & V. Stodden. When Does Non-Negative Matrix Factorization. Give a Correct Decomposition into Parts? pdf

- Mon 3/15/10
- Graph Laplacians and Spectral Clustering
- further reading:

Graph Laplacians.Wikipedia page

U von Luxburg. A Tutorial on Spectral Clustering. pdf

- Weds 3/17/10
- Mon 3/22/10
- Multidimensional Scaling and SMACOF
- further reading:

The first ref provides a nice intro to classical scaling and background, along with a nice extension.

L. Cayton and S. Dasgupta. Robust Euclidean embedding. pdf

Multidimensional ScalingWikipedia page

Stress Majorization. Wikipedia page

J. de Leeuw & P. Mair. Multidimensional Scaling Using Majorization: SMACOF in R. pdf

Ch 14. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. pdf

- Weds 3/24/10
- No Class

- Mon 3/29/10
- Multi-Dimensional Scaling and non-linear dimensionality reduction
- further reading:

Ch 14. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. pdf

- Weds 3/31/10
- CCA and Multi-View Learning
- further reading:

CCA. Wikipedia page

D. Foster, S. M. Kakade, T. Zhang. Multi-View Dimensionality Reduction via Canonical Correlation Analysis. pdf

- Mon 4/5/10
- Spectral methods for learning time series.
- lecture notes pdf

- Weds 4/7/10
- Mon 4/12/10
- Weds 4/14/10
- Weds 4/19/10
- Concentration results for random matrices.
- lecture notes:

Basic statements. pdf

Derivations. pdf - further reading: these papers have "Bernstein" like bounds,
for variance control.

B. Recht. A Simpler Approach to Matrix Completion. pdf

See R. Vershynin's notes on the Ahlswede-Winter method and the Golden-Thompson proof. website

- Weds 4/21/10 & Mon 4/26/10
- Matrix norms and matrix regularization.
- further reading:

S. M. Kakade, S. Shalev-Shwartz, A. Tewari. Applications of strong convexity--strong smoothness duality to learning with matrices. pdf