Machine Learning Summer School 2003

Courses abstracts, and related material...

Lectures

Statistical Learning Theory (Olivier Bousquet)
Independent Component Analysis (Jean-François Cardoso)
Gaussian Processes (Carl Rasmussen)
Learning with Kernels (Bernhard Schoelkopf)
Monte-Carlo Simulation Methods (Christophe Andrieu)
Bioinformatics (Pierre Baldi)
Stochastic Learning (Leon Bottou)
Concentration Inequalities with Machine Learning Applications (Stéphane Boucheron)
Some Mathematical Tools for Machine Learning (Chris Burges)
Universal Modeling: Introduction to modern MDL (Peter Grünwald)
Information Retrieval and Language Technology (Thorsten Joachims)
Foundations of Learning (Stephen Smale)
Unsupervised Learning with Kernels (Alex Smola)
Bayesian Inference: Principles and Practice (Mike Tipping)
An Introduction to Pattern Classification (Elad Yom-Tov)

Evening Talks

Empirical Inference (Vladimir Vapnik)
Analysis of Support Vector Machine Classification (Ding-Xuan Zhou)
On Learning Vector-Valued Functions (Massimiliano Pontil)

Practical Sessions

Support Vector Machines (Jason Weston, Arthur Gretton, and Andre Elisseeff)
Simulation Methods (Manuel Davy)
Pattern Classification: from Data to Decision (Elad Yom-Tov)

Statistical Learning Theory

Olivier Bousquet, Max Planck Institute for Biological Cybernetics, Tuebingen - 8 hours

This course will give a detailed introduction to learning theory with a focus on the classification problem. It will be shown how to obtain (pobabilistic) bounds on the generalization error for certain types of algorithms. The main themes will be

probabilistic inequalities and concentration inequalities
union bounds, chaining
measuring the size of a function class, Vapnik Chervonenkis dimension, shattering dimension and Rademacher averages
classification with real-valued functions

Some knowledge of probability theory would be helpful but not required since the main tools will be introduced.

Material related to the lectures:

Statistical Learning Theory
- 1 slide/page pdf: http://www.cmap.polytechnique.fr/~bousquet/mlss_slt.pdf
- 2 slides/page ps.gz: http://www.cmap.polytechnique.fr/~bousquet/mlss_slt4.ps.gz

Informal remarks on SLT
http://www.cmap.polytechnique.fr/~bousquet/mlss_philo.pdf

Independent Component Analysis

J.-F. Cardoso, ENST Paris - 8 hours

The course provides an introduction to independent component analysis and source separation. We start from simple statistical principles; examine connections to information theory and to sparse coding; we give an overview of available algorithmics; we also show how several key ideas of ICA are illuminated by information geometry.

Material related to the lecture :

http://www.tsi.enst.fr/~cardoso/mlss.html

Gaussian Processes

C. Rasmussen, MPIK Tuebingen - 8 hours

Slides and code:

http://www.kyb.tuebingen.mpg.de/~carl/mlss03

Learning with Kernels

B. Schoelkopf, MPIK Tuebingen - 6 hours

The course will cover the basics of Support Vector Machines and related kernel methods.

Kernel and Feature Spaces
Large Margin Classification
Basic Ideas of Learning Theory
Support Vector Machines
Other Kernel Algorithms

slides (PS.GZ)

Unsupervised Learning with Kernels

A. Smola, ANU - 6 hours

An Introduction to Pattern Classification

E. Yom-Tov, Technion, Haifa - 4 hours

Handouts (PDF)

Monte Carlo Simulation methods

C. Andrieu, University of Bristol - 4 hours

Bioinformatics

P. Baldi, UC Irvine - 4 hours

More on Bioinformatics can be found on Pierre Baldi's homepage:
http://www.ics.uci.edu/~pfbaldi/publications.htm
http://www.ics.uci.edu/~pfbaldi/tutorials.htm

Stochastic Learning

L. Bottou, NEC Research, Princeton - 4 hours

Material:

[bottou.ps.gz]
The slides for the four parts.
Very similar to the ones in the book.

[icml-bottou.djvu]
The slides I used during the first hour
to illustrate large scale stochastic gradient learning.

In addition, let me explain how to run the demo I gave during the first hour. This works under Linux.
- Step1: Obtain Lush sources from CVS.
% cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/lush login
Password: <enter>
% cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/lush co
lush
- Step2: Compile Lush
Read section PRE-REQUISITES in lush/README.
% cd lush
% configure
% make
- Step3: Start demo
% cd packages/sn28/examples/bptool
Remaining instructions can be found in
file lush/packages/sn28/examples/bptool/README.

Concentration Inequalities with Machine Learning Applications

S. Boucheron, LRI Orsay - 4 hours

Slides :
http://www.lri.fr/~bouchero/PUB/tuebfun.pdf

Some Mathematical Tools for Machine Learning

C. Burges, Microsoft Research, Redmond - 4 hours

Lagrange multipliers:
- Lagrange the Mathematician
- Lagrange multipliers: an indirect approach can be easier
- Multiple Equality Constraints
- Multiple Inequality Constraints
- Two points on a d-sphere
- The Largest Parallelogram
- Resource allocation
- A convex combination of numbers is maximized by choosing the largest
- The Isoperimetric problem
- For fixed mean and variance, which univariate distribution has maximum entropy?
- An exact solution for an SVM living on a simplex
Notes on some Basic Statistics
- Probabilities can be Counter-Intuitive (Simpson's paradox; the Monty Hall puzzle)
- IID-ness: Measurement Error decreases as 1/sqrt{n}
- Correlation versus Independence
- The Ubiquitous Gaussian:
  - Product of Gaussians is Gaussian
  - Convolution of two Gaussians is a Gaussian
  - Projection of a Gaussian is a Gaussian
  - Sum of Gaussian random variables is a Gaussian random variables
  - Uncorrelated Gaussian variables are also independent
  - Maximum Likelihood Estimates for mean and covariance (prove required matrix identities)
  - Aside: For 1-dim Laplacian, max. likelihood gives the median
- Using cumulative distributions to derive densities
Principal Component Analysis and Generalizations
- Ordering by Variance
- Does Grouping Change Things?
- PCA Decorrelates the Samples
- PCA gives Reconstruction with Minimal Mean Squared Error
- PCA preserves Mutual Information on Gaussian data
- PCA directions lie in the span of the data
- PCA: second order moments only
- The Generalized Rayleigh Quotient
  - Non-orthogonal principal directions
  - OPCA
  - Fisher Linear Discriminant
  - Multiple Discriminant Analysis
Elements of Functional Analysis
- High Dimensional Spaces
- Is Winning Transitive?
- Most of the Volume is Near the Surface: Cubes
- Spheres in n-dimensions
- Banach Spaces, Hilbert Spaces, Compactness
- Norms
- Useful Inequalities (Minkowski and Holder)
- Vector Norms
- Matrix Norms
- The Hamming Norm
- L1, L2, L_infty norms - is L0 a norm?
- Example: Using a Norm as a Constraint in Kernel Algorithms

These are lectures on some fundamental mathematics underlying many approaches and algorithms in machine learning. They are not about particular learning algorithms; they are about the basic concepts and tools upon which such algorithms are built. Often students feel intimidated by such material: there is a vast amount of "classical mathematics", and it can be hard to find the wood for the trees. The main topics of these lectures are Lagrange multipliers, functional analysis, some notes on matrix analysis, and convex optimization. I've concentrated on things that are often not dwelt on in typical CS coursework. Lots of examples are given; if it's green, it's a puzzle for the student to think about. These lectures are far from complete: perhaps the most significant omissions are probability theory, statistics for learning, information theory, and graph theory. I hope eventually to turn all this into a series of short tutorials. Please let me know of any errors, etc. (from Chris Burges homepage : http://research.microsoft.com/~cburges )
Link to the slides :

http://research.microsoft.com/~cburges/talks/lecturesTuebingenBurges.ps.gz

Universal Modeling: Introduction to modern MDL

P. Grunwald, CWI Amsterdam - 4 hours

We give a tutorial introduction to the *modern* Minimum Description Length (MDL) Principle, taking into account the many refinements and developments that have taken place in the 1990s. These do not seem to be widely known outside the information theory community. We will especially emphasize the use of MDL in classification. We also consider the connections between MDL, Bayesian inference, maximum entropy inference and structural risk minimization.

Slides can be accessed via http://www.grunwald.nl

Information Retrieval and Language Technology

T. Joachims, Cornell University - 4 hours

The course will give an overview of how statistical learning can help organize and access information that is represented in textual form. In particular, it will cover tasks like text classification, information retrieval, information extraction, topic detection, and topic tracking. The course will introduce the basic techniques for representing text and analyze their statistical properties. An emphasis of the course will be on giving an overview of interesting learning problems in this area, providing starting points for future research.

Slides (PDF)

Foundations of Learning

S. Smale, UC Berkeley - 4 hours

Bayesian Inference: Principles and Practice

M. Tipping, Microsoft Research, Cambridge - 4 hours

The aim of this course is two-fold: to convey the basic principles of Bayesian machine learning and to describe a practical implementation framework. Firstly, we will give an introduction to Bayesian approaches, focussing on the advantages of probabilistic modelling, the concept of priors, and the key principle of marginalisation. Secondly, we will exploit these ideas to realise practical algorithms for sparse linear regression and classification, as exemplified by models such as the "relevance vector machine".

The slides from my lectures, along with other related materials, are available via:
http://www.research.microsoft.com/mlp/RVM/

Empirical Inference

V. Vapnik - evening lecture

Analysis of Support Vector Machine Classification

Ding-Xuan Zhou - evening lecture

On Learning Vector-Valued Functions

Massimiliano Pontil - evening lecture

slides (PS)

Pattern classification - From data to decision

E. Yom-Tov - practical session

Link to the classification toolbox:
http://tiger.technion.ac.il/~eladyt/classification/index.htm

Support Vector Machines

A. Gretton, A. Elisseeff, J. Weston - practical session

You can find the slides and code for the SVM practical session at:
http://www.kyb.tuebingen.mpg.de/bs/people/weston/svmpractical/index.html

Simulation Methods

M. Davy - practical session

In this practical session, we will implement basic simulation algorithms in Matlab. Special focus will devoted to

the Metropolis-Hastings algorithm used in MCMC simulation methods
Sequential Importance Sampling.

Slides : .tar.gz,.pdf

Last modified April 22, 2004