Machine Learning Summer School 2003
Courses abstracts, and related material...
Lectures
Statistical
Learning Theory (Olivier Bousquet)
Independent Component Analysis
(Jean-François Cardoso)
Gaussian
Processes (Carl Rasmussen)
Learning
with Kernels (Bernhard Schoelkopf)
Monte-Carlo Simulation Methods
(Christophe Andrieu)
Bioinformatics
(Pierre Baldi)
Stochastic
Learning (Leon Bottou)
Concentration
Inequalities with Machine Learning Applications
(Stéphane Boucheron)
Some Mathematical Tools for Machine Learning
(Chris Burges)
Universal
Modeling: Introduction to modern MDL (Peter Grünwald)
Information Retrieval and Language
Technology (Thorsten Joachims)
Foundations
of Learning (Stephen Smale)
Unsupervised
Learning with Kernels (Alex Smola)
Bayesian Inference: Principles and Practice
(Mike Tipping)
An
Introduction to Pattern Classification (Elad Yom-Tov)
Evening Talks
Empirical
Inference (Vladimir Vapnik)
Analysis of Support Vector Machine
Classification (Ding-Xuan Zhou)
On Learning Vector-Valued Functions
(Massimiliano Pontil)
Practical Sessions
Support
Vector Machines (Jason Weston, Arthur Gretton, and Andre
Elisseeff)
Simulation
Methods (Manuel Davy)
Pattern Classification: from Data to
Decision (Elad Yom-Tov)
Statistical
Learning Theory
Olivier Bousquet,
Max Planck Institute for Biological Cybernetics, Tuebingen - 8 hours
This course will give a detailed introduction to learning theory with a
focus on the classification problem. It will be shown how to obtain
(pobabilistic) bounds on the generalization error for certain types of
algorithms. The main themes will be
- probabilistic inequalities and concentration inequalities
- union bounds, chaining
- measuring the size of a function class, Vapnik Chervonenkis
dimension, shattering dimension and Rademacher averages
- classification with real-valued functions
Some knowledge of probability theory would be helpful but not required
since the main tools will be introduced.
Material related to the lectures:
Statistical Learning Theory
- 1 slide/page pdf: http://www.cmap.polytechnique.fr/~bousquet/mlss_slt.pdf
- 2 slides/page ps.gz:
http://www.cmap.polytechnique.fr/~bousquet/mlss_slt4.ps.gz
Informal remarks on SLT
http://www.cmap.polytechnique.fr/~bousquet/mlss_philo.pdf
Independent
Component Analysis
J.-F. Cardoso,
ENST Paris - 8 hours
The course provides an introduction to independent component analysis
and source separation. We start from simple statistical principles;
examine connections to information theory and to sparse coding; we give
an overview of available algorithmics; we also show how several key
ideas of ICA are illuminated by information geometry.
Material related to the lecture :
http://www.tsi.enst.fr/~cardoso/mlss.html
Gaussian
Processes
C. Rasmussen,
MPIK Tuebingen - 8 hours
Slides and code:
http://www.kyb.tuebingen.mpg.de/~carl/mlss03
Learning with
Kernels
B. Schoelkopf, MPIK
Tuebingen - 6 hours
The course will cover the basics of Support Vector Machines and related
kernel methods.
- Kernel and Feature Spaces
- Large Margin Classification
- Basic Ideas of Learning Theory
- Support Vector Machines
- Other Kernel Algorithms
slides (PS.GZ)
Unsupervised
Learning with Kernels
A. Smola, ANU -
6 hours
An
Introduction to Pattern Classification
E. Yom-Tov,
Technion, Haifa - 4 hours
Handouts (PDF)
Monte
Carlo Simulation methods
C. Andrieu,
University of Bristol - 4 hours
Bioinformatics
P. Baldi, UC Irvine -
4 hours
More on Bioinformatics can be found on Pierre Baldi's homepage:
http://www.ics.uci.edu/~pfbaldi/publications.htm
http://www.ics.uci.edu/~pfbaldi/tutorials.htm
Stochastic
Learning
L. Bottou, NEC
Research, Princeton - 4 hours
Material:
[bottou.ps.gz]
The slides for the four parts.
Very similar to the ones in the book.
[icml-bottou.djvu]
The slides I used during the first hour
to illustrate large scale stochastic gradient learning.
In addition, let me explain how to run the demo I gave during the first
hour. This works under Linux.
- Step1: Obtain Lush sources
from CVS.
% cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/lush
login
Password: <enter>
% cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/lush
co
lush
- Step2: Compile Lush
Read section PRE-REQUISITES in lush/README.
% cd lush
% configure
% make
- Step3: Start demo
% cd packages/sn28/examples/bptool
Remaining instructions can be found in
file lush/packages/sn28/examples/bptool/README.
Concentration
Inequalities with Machine Learning Applications
S. Boucheron, LRI Orsay -
4 hours
Slides :
http://www.lri.fr/~bouchero/PUB/tuebfun.pdf
Some
Mathematical Tools for Machine Learning
C. Burges,
Microsoft Research, Redmond - 4 hours
- Lagrange multipliers:
- Lagrange the Mathematician
- Lagrange multipliers: an indirect approach can be easier
- Multiple Equality Constraints
- Multiple Inequality Constraints
- Two points on a d-sphere
- The Largest Parallelogram
- Resource allocation
- A convex combination of numbers is maximized by choosing the
largest
- The Isoperimetric problem
- For fixed mean and variance, which univariate distribution
has maximum entropy?
- An exact solution for an SVM living on a simplex
- Notes on some Basic Statistics
- Probabilities can be Counter-Intuitive (Simpson's paradox;
the Monty Hall puzzle)
- IID-ness: Measurement Error decreases as 1/sqrt{n}
- Correlation versus Independence
- The Ubiquitous Gaussian:
- Product of Gaussians is Gaussian
- Convolution of two Gaussians is a Gaussian
- Projection of a Gaussian is a Gaussian
- Sum of Gaussian random variables is a Gaussian random
variables
- Uncorrelated Gaussian variables are also independent
- Maximum Likelihood Estimates for mean and covariance
(prove required matrix identities)
- Aside: For 1-dim Laplacian, max. likelihood gives the
median
- Using cumulative distributions to derive densities
- Principal Component Analysis and Generalizations
- Ordering by Variance
- Does Grouping Change Things?
- PCA Decorrelates the Samples
- PCA gives Reconstruction with Minimal Mean Squared Error
- PCA preserves Mutual Information on Gaussian data
- PCA directions lie in the span of the data
- PCA: second order moments only
- The Generalized Rayleigh Quotient
- Non-orthogonal principal directions
- OPCA
- Fisher Linear Discriminant
- Multiple Discriminant Analysis
- Elements of Functional Analysis
- High Dimensional Spaces
- Is Winning Transitive?
- Most of the Volume is Near the Surface: Cubes
- Spheres in n-dimensions
- Banach Spaces, Hilbert Spaces, Compactness
- Norms
- Useful Inequalities (Minkowski and Holder)
- Vector Norms
- Matrix Norms
- The Hamming Norm
- L1, L2, L_infty norms - is L0 a norm?
- Example: Using a Norm as a Constraint in Kernel Algorithms
These are lectures on some fundamental mathematics
underlying many approaches and algorithms in machine learning.
They are not about particular learning algorithms; they are about the
basic concepts and tools upon which such algorithms are built.
Often students feel intimidated by such material: there is a vast amount
of "classical mathematics", and it can be hard to find the wood for the
trees. The main topics of these lectures are Lagrange multipliers,
functional analysis, some notes on matrix analysis, and convex
optimization. I've concentrated on things that are often not
dwelt on in typical CS coursework. Lots of examples are given; if
it's green, it's a puzzle for the student to think about. These lectures are far from complete: perhaps
the most significant omissions are probability theory, statistics for
learning, information theory, and graph theory. I hope eventually
to turn all this into a series of short tutorials. Please let me
know of any errors, etc. (from Chris Burges homepage : http://research.microsoft.com/~cburges )
Link to the slides :
http://research.microsoft.com/~cburges/talks/lecturesTuebingenBurges.ps.gz
Universal
Modeling: Introduction to modern MDL
P. Grunwald, CWI Amsterdam - 4
hours
We give a tutorial introduction to the *modern* Minimum Description
Length (MDL) Principle, taking into account the many refinements and
developments that have taken place in the 1990s. These do not seem to be
widely known outside the information theory community. We will
especially emphasize the use of MDL in classification. We also consider
the connections between MDL, Bayesian inference, maximum entropy
inference and structural risk minimization.
Slides can be accessed via http://www.grunwald.nl
Information
Retrieval and Language Technology
T. Joachims, Cornell
University - 4 hours
The course will give an overview of how statistical learning can help
organize and access information that is represented in textual form. In
particular, it will cover tasks like text classification, information
retrieval, information extraction, topic detection, and topic tracking.
The course will introduce the basic techniques for representing text and
analyze their statistical properties. An emphasis of the course will be
on giving an overview of interesting learning problems in this area,
providing starting points for future research.
Slides (PDF)
Foundations of
Learning
S. Smale, UC Berkeley
- 4 hours
Bayesian
Inference: Principles and Practice
M. Tipping,
Microsoft Research, Cambridge - 4 hours
The aim of this course is two-fold: to convey the basic principles of
Bayesian machine learning and to describe a practical implementation
framework. Firstly, we will give an introduction to Bayesian approaches,
focussing on the advantages of probabilistic modelling, the concept of
priors, and the key principle of marginalisation. Secondly, we will
exploit these ideas to realise practical algorithms for sparse linear
regression and classification, as exemplified by models such as the
"relevance vector machine".
The slides from my lectures, along with other related materials, are
available via:
http://www.research.microsoft.com/mlp/RVM/
Empirical Inference
V. Vapnik -
evening lecture
Analysis
of Support Vector Machine Classification
Ding-Xuan Zhou
- evening lecture
On
Learning Vector-Valued Functions
Massimiliano Pontil
- evening lecture
slides (PS)
Pattern
classification - From data to decision
E. Yom-Tov -
practical session
Link to the classification toolbox:
http://tiger.technion.ac.il/~eladyt/classification/index.htm
Support
Vector Machines
A. Gretton, A. Elisseeff, J. Weston -
practical session
You can find the slides and code for the SVM practical session at:
http://www.kyb.tuebingen.mpg.de/bs/people/weston/svmpractical/index.html
Simulation Methods
M. Davy - practical
session
In this practical session, we will implement basic simulation
algorithms in Matlab. Special focus will devoted to
- the Metropolis-Hastings algorithm used in MCMC simulation methods
- Sequential Importance Sampling.
Slides : .tar.gz,.pdf
Last modified April 22, 2004