COLT Course, Master in Computing, Fall 2010: October 2010

Saturday, October 30, 2010

2010-10-29 More on VC-dimension

Today we proved the upper and lower bound theorems relating sample size for PAC-learning and VC-dimension.

We also presented the Chernoff and Hoeffding bounds, which we will use other times during the course (and are very often used in algorithmic design and complexity).
In fact, it was needed at one point of the proof above.

Exercise 9: Complete the proof that Pr(A) <= 2Pr(B) that we left pending in the proof of the upper bound theorem. This uses Hoeffding. Additional exercise, not stated in class:

Exercise 10: The upper and bound theorems presented in class gave conditions on the sample size for consistent algorithms, that is, learning algorithms that always achieve 0 errors on the sample. Not algorithms are like consistent. Extend the upper bound theorem to show that for sufficiently
large sample size, we have

|error_D(c) - error_S(c)| < epsilon with probability 1-delta

This is a more general form of generalization bound.

(Hint: go over the proof for the consistent case, see where the consistency hypothesis (error_S(c) = 0) is used, and see what you can claim for non-consistent algorithms. The Chernoff or Hoeffding bound will be used. You should get a dependence on epsilon of the form 1/epsilon^2 rather than 1/epsilon).

2010-10-26 Generalization bounds and the VC-dimension

Today we defined version space, growth function, and VC-dimension of a concept class.
We stated and proved Sauer's lemma, which tells us how the growth function grows (sorry for the redundance).

We also stated two theorems essentially saying that the sample size needed for PAC learning a class is both upper and lower bounded by the VC-dimension, up to constants, when we fix the epsilon and delta parameters.

Exercise 8:
- find the VC-dimension of hyperrectangles in R^d
- find the VC-dimension of unions of at most k rectangles in R^2
- find the VC-dimension of linear halfspaces in R^2
- find the VC-dimension of the class of sinusoid concepts S_{a,b,c} on R. Concept S_{a,b,c} is the set of points such that a*sin(b*x)+c > 0.

Friday, October 22, 2010

2010-10-22 More on PAC learning

Things we covered in today's class:

finished proof of PAC learning monotone conjunctions, and PAC learning algorithm for rectangles (see chapter 1 of Introduction to Computational Learning Theory)
definition of alternative PAC learning oracle-based models:

the sample-based model that in addition receives as input the size of the target concept
the one-oracle model using random example generator EX()
the two-oracle model using positive random example generatos EX+() and negative random example generator EX-()

reduction of online to PAC learning, more information on this in:

Angluin's 1988 paper Queries and concept learning contains a similar reduction from the query learning model to PAC
Littlestone's paper From on-line to batch learning

Exercise 5: Generalize the PAC algorithm for rectangles to
n-dimestional hyperrectangle.

Exercise 6: Show that an algorithm that is always able to output
consistent hypotheses is a PAC learning algorithm, given that the
sample size is at least $\frac{1}{\epsilon} \ln \frac{|H|}{\delta}$.

Exercise 7: Show that all four alternative models are indeed equivalent.
You may find the proofs to part of this in the original paper
Equivalence of models for polynomial learnability).

Tuesday, October 19, 2010

2010-10-19 PAC learning model

We finished the proof of the Weighted Majority Algorithm.

Exercise 4: Re-do analysis for WMA but instead of predicting according to the weighted majority, predict as follows: choose an expert at random according to probability w_i/W, and predict according to the chosen expert. You should get a better constant in the bound.

Then, we covered the definitions of representation class and PAC learning. The model was originally introduced in the classic 1984 paper by Valiant A theory of the learnable. We also saw one example of a learning algorithm that works in the PAC model: the elimination algorithm for learning monotone conjunctions. The proof is left for next class.

This material is well covered in the first chapter of the Kearns and Vazirani's textbook An Introduction to Computational Learning Theory.

Friday, October 15, 2010

Scribes and exercises

Dear all:

You can find various information on the course, including scribing assignments and problems with their due dates at:

http://www.lsi.upc.edu/~marias/teaching/colt-10.html

2010-10-15: Winnow and Weighted Majority Algorithms

We covered two classic online algorithms:

The Winnow algorithm that is able to learn in an attribute efficient way. Originally introduced in this paper by Littlestone: Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
The Weighted Majority Algorithm that is able to learn from multiple experts. Introduced in this paper by Littlestone and Warmuth: The Weighted Majority Algorithm

The lecture was based in Blum's survey paper On-Line Algorithms in Machine Learning.

Exercise 3: Re-do analysis for winnow but instead of cutting weights in half in a demotion step, set weights to 0. This is, in fact, the "elimination algorithm".

Monday, October 11, 2010

2010-10-08: More on the mistake-bounded model

Today:

we finished the proof that linear functions F^n --> F for a field F are learnable with n mistakes
we showed that subspaces of F^n are learnable with n mistakes (furthermore, they're positive mistakes)
we showed that if a concept class C is learnable from b positive mistakes, then the class DIFF_k(C) of nested differences of at most k concepts in C is learnable from b*k mistakes.
we argued (by informal "reductions") that the set of conjunctions of literals on n variables is learnable with 2n mistakes, and that concepts specified by non-homogeneous linear equations over a field are learnable with n+1 mistakes.

Later in the course, we will define formally this idea of "learning reduction". But, for the time being, we proposed:

Exercise 1: Show that 2-CNF are learnable with O(n^2) mistakes by reduction to conjunctions of boolean variables.

Exercise 2: Show that n-variable, degree-d polynomials over a field are learnable from O(n^d) mistakes, by reduction to the degree-1 case.

Tuesday, October 5, 2010

Last year's course

If you are curious, last year's blog for the course is:

http://coltlsiupc09.blogspot.com/

2010-10-05 Introduction and the Mistake-Bounded model

We covered:

administrative issues and mostly the evaluation method
the goals and methods of Computational Learning Theory
various "features" that may distinguish models of learning
the Mistake Bounded (MB) model, one of the 3 main formal models of learning we will see in the course
an algorithm for learning conjunctions of boolean variables in the MB model, and proved its correctness
an algorithm for learning linear functions over fields in the MB model; its correctness proof is pending.

Sunday, October 3, 2010

Some references for the course

The class does not follow any textbook in particular, and throughout the course we will give pointers to relevant papers and other resources. If you do want a book, a classical reference is An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani; the book is partially available from Google Books.

Here is a list of courses from other institutions. Let us know if any of the links are broken, or if you find some other relevant course: